lenskit.data.Vocabulary#

class lenskit.data.Vocabulary(keys=None, name=None, *, reorder=True)#

Bases: object

Vocabularies of entity identifiers for the LensKit data model.

This class supports bidirectional mappings between key-like data and congiguous nonnegative integer indices. Its key use is to facilitate the entity ID vocabularies in Dataset, but it can also be used for things like item tags.

IDs in a vocabulary must be unique. Constructing a vocabulary with reorder=True ensures uniqueness (and sorts the IDs), but does not preserve the order of IDs in the original input.

It is currently a wrapper around pandas.Index, but this fact is not part of the stable public API.

Parameters:

keys (IDSequence | pd.Index | Iterable[ID] | None) – The IDs to put in the vocabulary.
name (str | None) – The vocabulary name (i.e. the entity class it stores IDs for).
reorder (bool) – If True, sort and deduplicate the IDs. If False (the default), use the IDs as-is (assigning each to their position in the input sequence).

Stability:

Caller (see Stability Levels).

__init__(keys=None, name=None, *, reorder=True)#

Parameters:

keys (IDSequence | pd.Index | Iterable[ID] | None)
name (str | None)
reorder (bool)

Methods

`__init__`([keys, name, reorder])
`id`(num)	Alias for `term()` for greater readability for entity ID vocabularies.
`id_array`()
`ids`()	Alias for `terms()` for greater readability for entity ID vocabularies.
`number`()	Look up the number for a vocabulary ID.
`numbers`()	Look up the numbers for an array of terms or IDs.
`term`(num)	Look up the term with a particular number.
`terms`()	Get a list of terms, optionally for an array of term numbers.

Attributes

`index`	The vocabulary as a Pandas index.
`size`	Current vocabulary size.
`name`	The name of the vocabulary (e.g. “user”, “item”).

name: str | None#: The name of the vocabulary (e.g. “user”, “item”).

property index: Index#

The vocabulary as a Pandas index.

Stability:: Internal (see Stability Levels).

property size: int#: Current vocabulary size.

number(term: object, missing: Literal['error'] = 'error') → int#
number(term: object, missing: Literal['none'] | None) → int | None: Look up the number for a vocabulary ID.

numbers(terms: Sequence[Hashable] | TypeAliasForwardRef('numpy.typing.ArrayLike'), missing: Literal['error', 'negative'] = 'error', *, format: Literal['numpy'] = 'numpy') → ndarray[tuple[Any, ...], dtype[int32]]#
numbers(terms: Sequence[Hashable] | TypeAliasForwardRef('numpy.typing.ArrayLike'), missing: Literal['error', 'negative', 'null'] = 'error', *, format: Literal['arrow']) → Int32Array: Look up the numbers for an array of terms or IDs.

term(num)#

Look up the term with a particular number. Negative indexing is not supported.

Parameters:: num (int)
Return type:: object

terms(nums: list[int] | ndarray[tuple[Any, ...], dtype[integer]] | Series | None = None, *, format: Literal['arrow']) → Array

Get a list of terms, optionally for an array of term numbers.

Parameters:: nums – The numbers (indices) for of terms to retrieve. If None, returns all terms.
Returns:: The terms corresponding to the specified numbers, or the full array of terms (in order) if nums=None.

id(num)#

Alias for term() for greater readability for entity ID vocabularies.

Parameters:: num (int)
Return type:: object

ids(nums: list[int] | ndarray[tuple[Any, ...], dtype[integer]] | Series | None = None, *, format: Literal['numpy'] = 'numpy') → ndarray[tuple[int], dtype[integer[Any] | str_ | bytes_ | object_]]#
ids(nums: list[int] | ndarray[tuple[Any, ...], dtype[integer]] | Series | None = None, *, format: Literal['arrow']) → Array: Alias for terms() for greater readability for entity ID vocabularies.

lenskit.data.Vocabulary#

This Page