lenskit.data.DatasetBuilder#

class lenskit.data.DatasetBuilder(name=None)#

Construct data sets from data and tables.

__init__(name=None)#

Create a new dataset builder.

Parameters:: name (str | DataContainer | Dataset | None) – The dataset name. Can also be a data container or a dataset, which will initialize this builder with its contents to extend or modify.

Methods

`__init__`([name])	Create a new dataset builder.
`add_entities`()
`add_entity_class`(name)
`add_interactions`(cls, data, *[, entities, ...])
`add_list_attribute`()
`add_relationship_class`(name, entities[, ...])
`add_relationships`(cls, data, *[, entities, ...])
`add_scalar_attribute`()
`add_vector_attribute`(cls, name, entities, ...)	Add a vector attribute to a set of entities.
`build`()
`build_container`()
`clear_relationships`(cls)
`entity_classes`()	Get the entity classes defined so far.
`entity_id_type`(name)	Get the PyArrow data type for an entity classes's identifiers.
`filter_interactions`(cls[, min_time, ...])	Filter interactions based on timestamp or to remove particular entities.
`record_count`(class_name)
`relationship_classes`()	Get the relationship classes defined so far.
`save`(path)	Save the dataset to disk in the LensKit native format.

Attributes

`name`
`schema`	The data schema assembled so far.

schema: DataSchema#: The data schema assembled so far. Do not modify this schema directly.

entity_classes()#

Get the entity classes defined so far.

relationship_classes()#

Get the relationship classes defined so far.

entity_id_type(name)#

Get the PyArrow data type for an entity classes’s identifiers.

filter_interactions(cls, min_time=None, max_time=None, remove=None)#

Filter interactions based on timestamp or to remove particular entities.

Parameters:

cls (str) – The interaction class to filter.
min_time (int | float | datetime | None) – The minimum interaction time to keep (inclusive).
max_time (int | float | datetime | None) – The maximum interaction time to keep (exclusive).
remove (Table | dict[str, numpy.typing.ArrayLike] | DataFrame | None) – Combinations of entity numbers or IDs to remove. The entities are filtered using an anti-join with this table, so providing a single column of entity IDs or numbers will remove all interactions associated with the listed entities.

add_vector_attribute(cls, name, entities, values, /, dim_names=None)#

Add a vector attribute to a set of entities.

Warning

The vector is stored densely, even for entities for which it is not set. High-dimensional vectors can therefore take up a lot of space.

Parameters:

cls (str) – The entity class name.
name (str) – The attribute name.
entities (IDSequence | tuple[IDSequence, ...]) – The entity IDs to which the attribute should be attached.
values (pa.Array[Any] | pa.ChunkedArray[Any] | np.ndarray[tuple[int, int], Any] | sparray) – The attribute values, as a fixed-length list array or a two-dimensional NumPy array.
dim_names (ArrayLike | pd.Index[Any] | Sequence[Any] | None) – The names for the dimensions of the array.

Return type:

None

save(path)#

Save the dataset to disk in the LensKit native format.

Parameters:: path (str | PathLike[str]) – The path where the dataset will be saved (will be created as a directory)