lenskit.data.DatasetBuilder#
- class lenskit.data.DatasetBuilder(name=None)#
Bases:
object
Construct data sets from data and tables.
- __init__(name=None)#
Create a new dataset builder.
Methods
__init__
([name])Create a new dataset builder.
add_entities
()add_entity_class
(name)add_interactions
(cls, data, *[, entities, ...])add_list_attribute
()add_relationship_class
(name, entities[, ...])add_relationships
(cls, data, *[, entities, ...])add_scalar_attribute
()add_vector_attribute
(cls, name, entities, ...)Add a vector attribute to a set of entities.
build
()build_container
()clear_relationships
(cls)Get the entity classes defined so far.
entity_id_type
(name)Get the PyArrow data type for an entity classes's identifiers.
filter_interactions
(cls[, min_time, ...])Filter interactions based on timestamp or to remove particular entities.
record_count
(class_name)Get the relationship classes defined so far.
save
(path)Save the dataset to disk in the LensKit native format.
Attributes
name
The data schema assembled so far.
- schema: DataSchema#
The data schema assembled so far. Do not modify this schema directly.
- entity_classes()#
Get the entity classes defined so far.
- Return type:
- relationship_classes()#
Get the relationship classes defined so far.
- Return type:
- entity_id_type(name)#
Get the PyArrow data type for an entity classes’s identifiers.
- filter_interactions(cls, min_time=None, max_time=None, remove=None)#
Filter interactions based on timestamp or to remove particular entities.
- Parameters:
cls (str) – The interaction class to filter.
min_time (int | float | datetime | None) – The minimum interaction time to keep (inclusive).
max_time (int | float | datetime | None) – The maximum interaction time to keep (exclusive).
remove (Table | dict[str, numpy.typing.ArrayLike] | DataFrame | None) – Combinations of entity numbers or IDs to remove. The entities are filtered using an anti-join with this table, so providing a single column of entity IDs or numbers will remove all interactions associated with the listed entities.
- add_vector_attribute(cls, name, entities, values, /, dim_names=None)#
Add a vector attribute to a set of entities.
Warning
The vector is stored densely, even for entities for which it is not set. High-dimensional vectors can therefore take up a lot of space.
- Parameters:
cls (str) – The entity class name.
name (str) – The attribute name.
entities (IDSequence | tuple[IDSequence, ...]) – The entity IDs to which the attribute should be attached.
values (pa.Array[Any] | pa.ChunkedArray[Any] | np.ndarray[tuple[int, int], Any] | sparray) – The attribute values, as a fixed-length list array or a two-dimensional NumPy array.
dim_names (ArrayLike | pd.Index[Any] | Sequence[Any] | None) – The names for the dimensions of the array.
- Return type:
None