lenskit.data.EntitySet#

class lenskit.data.EntitySet(name, schema, vocabulary, table, _sel=None)#

Bases: object

Representation of a set of entities from the dataset. Obtained from Dataset.entities().

Parameters:
__init__(name, schema, vocabulary, table, _sel=None)#
Parameters:

Methods

__init__(name, schema, vocabulary, table[, _sel])

arrow()

Get these entities and their attributes as a PyArrow table.

attribute(name)

Get values of an attribute for the entites in this entity set.

count()

Return the number of entities in this entity set.

ids()

Get the identifiers of the entities in this set.

numbers()

Get the numbers (from the vocabulary) for the entities in this set.

pandas()

Get the entities and their attributes as a Pandas data frame.

select(*[, ids, numbers])

Select a subset of the entities in this set.

Attributes

attributes

name

The name of the entity class for these entities.

schema

vocabulary

The identifier vocabulary for this schema.

name: str#

The name of the entity class for these entities.

vocabulary: Vocabulary#

The identifier vocabulary for this schema.

count()#

Return the number of entities in this entity set.

Return type:

int

ids()#

Get the identifiers of the entities in this set. This is returned directly as PyArrow array instead of NumPy.

Return type:

ndarray[tuple[int], dtype[integer[Any] | str_ | bytes_ | object_]]

numbers()#

Get the numbers (from the vocabulary) for the entities in this set.

Return type:

ndarray[int, dtype[int32]]

arrow()#

Get these entities and their attributes as a PyArrow table.

Return type:

Table

pandas()#

Get the entities and their attributes as a Pandas data frame.

Return type:

DataFrame

attribute(name)#

Get values of an attribute for the entites in this entity set.

Parameters:

name (str)

Return type:

AttributeSet

select(*, ids=None, numbers=None)#

Select a subset of the entities in this set.

Note

The vocabulary is unchanged, so numbers in the resulting set will be entity numbers in the dataset’s vocabulary. They are not rearranged to be relative to this entity set.

Parameters:
  • ids (IDSequence | None) – The entity identifiers to select.

  • numbers (np.ndarray[int, np.dtype[np.integer[Any]]] | pa.IntegerArray[Any] | None) – The entity numbers to select.

Returns:

The entity subset.

Return type:

EntitySet