lenskit.data.Vocabulary#
- class lenskit.data.Vocabulary(keys=None, name=None)#
Bases:
object
Vocabularies of terms, tags, entity IDs, etc. for the LensKit data model.
This class supports bidirectional mappings between key-like data and congiguous nonnegative integer indices. Its key use is to facilitate the user and item ID vocabularies in
Dataset
, but it can also be used for things like item tags.It is currently a wrapper around
pandas.Index
, but supports the ability to add additional vocabulary terms after the vocabulary has been created. New terms do not change the index positions of previously-known identifiers.- Parameters:
keys (IDSequence | pd.Index | Iterable[ID] | None)
name (str | None)
- __init__(keys=None, name=None)#
- Parameters:
keys (IDSequence | pd.Index | Iterable[ID] | None)
name (str | None)
Methods
__init__
([keys, name])add_terms
(terms)compatible_with_numbers_from
(other)Check if this vocabulary is compatible with numbers from another vocabulary.
copy
()Return a (cheap) copy of this vocabulary.
id
(num)Alias for
term()
for greater readability for entity ID vocabularies.ids
([nums])Alias for
terms()
for greater readability for entity ID vocabularies.number
()Look up the number for a vocabulary term.
numbers
(terms[, missing])Look up the numbers for an array of terms or IDs.
term
(num)Look up the term with a particular number.
terms
([nums])Get a list of terms, optionally for an array of term numbers.
Attributes
The property as a Pandas index.
Current vocabulary size.
The name of the vocabulary (e.g. “user”, “item”).
- number(term: object, missing: Literal['error'] = 'error') int #
- number(term: object, missing: Literal['none'] | None) int | None
Look up the number for a vocabulary term.
- numbers(terms, missing='error')#
Look up the numbers for an array of terms or IDs.
- term(num)#
Look up the term with a particular number. Negative indexing is not supported.
- terms(nums=None)#
Get a list of terms, optionally for an array of term numbers.
- Parameters:
nums (list[int] | ndarray[Any, dtype[integer]] | Series | None) – The numbers (indices) for of terms to retrieve. If
None
, returns all terms.- Returns:
The terms corresponding to the specified numbers, or the full array of terms (in order) if
nums=None
.- Return type:
ndarray[tuple[int], dtype[integer[Any] | str_ | bytes_ | object_]]
- copy()#
Return a (cheap) copy of this vocabulary. It retains the same mapping, but will not be updated if the original vocabulary has new terms added. However, since new terms are always added to the end, it will be compatible with the original vocabulary for all terms recorded at the time of the copy.
This method is useful for saving known vocabularies in model training.
- Return type:
- compatible_with_numbers_from(other)#
Check if this vocabulary is compatible with numbers from another vocabulary. They are compatible if the other vocabulary is no longer than this vocabulary, and the common prefix has identical IDs.
- Parameters:
other (Vocabulary | None) – The other vocabulary.
- Returns:
True
the same IDs will produce the same numbers from both vocabularies.- Return type: