lenskit.data.Vocabulary#
- class lenskit.data.Vocabulary(keys=None, name=None, *, reorder=True)#
Bases:
object
Vocabularies of entity identifiers for the LensKit data model.
This class supports bidirectional mappings between key-like data and congiguous nonnegative integer indices. Its key use is to facilitate the entity ID vocabularies in
Dataset
, but it can also be used for things like item tags.IDs in a vocabulary must be unique. Constructing a vocabulary with
reorder=True
ensures uniqueness (and sorts the IDs), but does not preserve the order of IDs in the original input.It is currently a wrapper around
pandas.Index
, but this fact is not part of the stable public API.- Parameters:
keys (IDSequence | pd.Index | Iterable[ID] | None) – The IDs to put in the vocabulary.
name (str | None) – The vocabulary name (i.e. the entity class it stores IDs for).
reorder (bool) – If
True
, sort and deduplicate the IDs. IfFalse
(the default), use the IDs as-is (assigning each to their position in the input sequence).
- Stability:
- Caller (see Stability Levels).
- __init__(keys=None, name=None, *, reorder=True)#
Methods
__init__
([keys, name, reorder])id
(num)Alias for
term()
for greater readability for entity ID vocabularies.id_array
()ids
()Alias for
terms()
for greater readability for entity ID vocabularies.number
()Look up the number for a vocabulary ID.
numbers
()Look up the numbers for an array of terms or IDs.
term
(num)Look up the term with a particular number.
terms
()Get a list of terms, optionally for an array of term numbers.
Attributes
The vocabulary as a Pandas index.
Current vocabulary size.
The name of the vocabulary (e.g. “user”, “item”).
- property index: Index#
The vocabulary as a Pandas index.
- Stability:
- Internal (see Stability Levels).
- number(term: object, missing: Literal['error'] = 'error') int #
- number(term: object, missing: Literal['none'] | None) int | None
Look up the number for a vocabulary ID.
- numbers(terms: Sequence[Hashable] | TypeAliasForwardRef('numpy.typing.ArrayLike'), missing: Literal['error', 'negative'] = 'error', *, format: Literal['numpy'] = 'numpy') ndarray[tuple[Any, ...], dtype[int32]] #
- numbers(terms: Sequence[Hashable] | TypeAliasForwardRef('numpy.typing.ArrayLike'), missing: Literal['error', 'negative', 'null'] = 'error', *, format: Literal['arrow']) Int32Array
Look up the numbers for an array of terms or IDs.
- term(num)#
Look up the term with a particular number. Negative indexing is not supported.
- terms(nums: list[int] | ndarray[tuple[Any, ...], dtype[integer]] | Series | None = None, *, format: Literal['numpy'] = 'numpy') ndarray[tuple[int], dtype[integer[Any] | str_ | bytes_ | object_]] #
- terms(nums: list[int] | ndarray[tuple[Any, ...], dtype[integer]] | Series | None = None, *, format: Literal['arrow']) Array
Get a list of terms, optionally for an array of term numbers.
- Parameters:
nums – The numbers (indices) for of terms to retrieve. If
None
, returns all terms.- Returns:
The terms corresponding to the specified numbers, or the full array of terms (in order) if
nums=None
.
- ids(nums: list[int] | ndarray[tuple[Any, ...], dtype[integer]] | Series | None = None, *, format: Literal['numpy'] = 'numpy') ndarray[tuple[int], dtype[integer[Any] | str_ | bytes_ | object_]] #
- ids(nums: list[int] | ndarray[tuple[Any, ...], dtype[integer]] | Series | None = None, *, format: Literal['arrow']) Array
Alias for
terms()
for greater readability for entity ID vocabularies.