lenskit.data.ItemListCollection#

class lenskit.data.ItemListCollection(key, *, index=True)#

Bases: Generic[K]

A collection of item lists.

An item list collection consists of a sequence of item lists with associated keys following a fixed schema. Item list collections support iteration (in order) and lookup by key. They are used to represent a variety of things, including test data and the results of a batch run.

The key schema can be specified either by a list of field names, or by providing a named tuple class (created by either namedtuple() or NamedTuple) defining the key schema. Schemas should not be nested: field values must be scalars, not tuples or lists. Keys should also be hashable.

This class exists, instead of using raw dictionaries or lists, to consistently handle some of the nuances of multi-valued keys, and different collections having different key fields; for example, if a run produces item lists with both user IDs and sequence numbers, but your test data is only indexed by user ID, the projected lookup capabilities make it easy to find the test data to go with an item list in the run.

Item list collections support lookup by index, like a list, returning a tuple of the key and list. If they are constructed with index=True, they also support lookup by _key_, supplied as either a tuple or an instance of the key type; in this case, the key is not returned. If more than one item with the same key is inserted into the collection, then the _last_ one is returned (just like a dictionary).

Parameters:
  • key (type[K] | Sequence[str]) – The type (a NamedTuple class) or list of field names specifying the key schema.

  • index (bool) – Whether or not to index lists by key to facilitate fast lookups.

__init__(key, *, index=True)#

Create a new item list collection.

Parameters:

Methods

__init__(key, *[, index])

Create a new item list collection.

add(list, *fields, **kwfields)

Add a single item list to this list.

add_from(other, **fields)

Add all collection from another collection to this collection.

from_df(df, key, *others)

Create an item list collection from a data frame.

from_dict()

Create an item list collection from a dictionary.

keys()

Iterate over keys.

lists()

Iterate over item lists without keys.

load_parquet(path)

Load this item list from a Parquet file using the native layout.

lookup()

Look up a list by key.

lookup_projected(key)

Look up an item list using a projected key.

save_parquet(path, *[, layout, batch_size, ...])

Save this item list collection to a Parquet file.

to_arrow(*[, batch_size])

Convert this item list collection to an Arrow table.

to_df()

Convert this item list collection to a data frame.

Attributes

key_fields

The names of the key fields.

key_type

The type of collection keys.

classmethod from_dict(data: Mapping[tuple[int | str | bytes | integer[Any] | str_ | bytes_ | object_, ...] | int | str | bytes | integer[Any] | str_ | bytes_ | object_, ItemList], key: type[K]) ItemListCollection[K]#
classmethod from_dict(data: Mapping[tuple[int | str | bytes | integer[Any] | str_ | bytes_ | object_, ...] | int | str | bytes | integer[Any] | str_ | bytes_ | object_, ItemList], key: Sequence[str] | str | None = None) ItemListCollection[tuple[int | str | bytes | integer[Any] | str_ | bytes_ | object_, ...]]

Create an item list collection from a dictionary.

classmethod from_df(df, key, *others)#

Create an item list collection from a data frame.

Note

Keys with empty item lists will be silently excluded from the output data.

Parameters:
  • df (DataFrame) – The data frame to convert.

  • key (type[K] | Sequence[str | AliasedColumn] | str | AliasedColumn) – The key type or field(s). Can be specified as a single column name (or AliasedColumn).

  • others (str | AliasedColumn) – Other columns to consider; primarily used to pass additional aliased columns to normalize other clumnes like the item ID.

to_df()#

Convert this item list collection to a data frame.

Warning

If this item list collection has any keys with empty lists, those lists will be excluded from the output.

Return type:

DataFrame

to_arrow(*, batch_size=5000)#

Convert this item list collection to an Arrow table.

The resulting table has one row per item list, with the item list contents an items column of a structured list type. This preserves empty item lists for higher-fidelity data storage.

Parameters:

batch_size (int) – The Arrow record batch size.

Return type:

Table

save_parquet(path, *, layout='native', batch_size=5000, compression='zstd')#

Save this item list collection to a Parquet file. This supports two types of Parquet files: “native” collections store one row per list, with the item list contents in a repeated structure column named items; this layout fully preserves the item list collection, including empty item lists. The “flat” layout is easier to work with in software such as Pandas, but cannot store empty item lists.

Parameters:
  • layout (Literal['native', 'flat']) – The table layout to use.

  • batch_size (int) – The Arrow record batch size.

  • compression (Literal['zstd', 'gzip', 'snappy', 'lz4'] | None) – The compression scheme to use.

  • path (PathLike[str])

Return type:

None

classmethod load_parquet(path)#

Load this item list from a Parquet file using the native layout.

Note

To load item list collections in the flat layout, use Pandas and from_df().

Parameters:

path (PathLike[str] | list[PathLike[str]]) – Path to the Parquet file to load.

Return type:

ItemListCollection

property key_fields: tuple[str]#

The names of the key fields.

property key_type: type[K]#

The type of collection keys.

add(list, *fields, **kwfields)#

Add a single item list to this list.

Parameters:
add_from(other, **fields)#

Add all collection from another collection to this collection. If field values are supplied, they are used to supplement or overwrite the keys in other; a common use case is to add results from multiple recommendation runs and save them a single field.

Parameters:
lookup(key: tuple) ItemList | None#
lookup(*key: int | str | bytes | integer[Any] | str_ | bytes_ | object_, **kwkey: int | str | bytes | integer[Any] | str_ | bytes_ | object_) ItemList | None

Look up a list by key. If multiple lists have the same key, this returns the last (like a dictionary).

This method can be called with the key tuple as a single argument (and this can be either the actual named tuple, or an ordinary tuple of the same length), or with the individual key fields as positional or named arguments.

Parameters:

key – The key tuple or key tuple fields.

lookup_projected(key)#

Look up an item list using a projected key. A projected key is a key that may have additional fields beyond those defined by this collection, that are ignored for the purposes of lookup.

Parameters:

key (tuple) – The key. Must be a named tuple (e.g. a key obtained from another item list collection).

Returns:

The item list with the specified key, projected to this collection’s key fields, or None if no such list exists.

Return type:

ItemList | None

lists()#

Iterate over item lists without keys.

Return type:

Iterator[ItemList]

keys()#

Iterate over keys.

Return type:

Iterator[K]