lenskit.splitting.records#
Functions
|
Partition a dataset by records into cross-fold partitions. |
Sample train-test a frame of ratings into train-test partitions. |
- lenskit.splitting.records.crossfold_records(data, partitions, *, rng_spec=None)#
Partition a dataset by records into cross-fold partitions. This partitions the records (ratings, play counts, clicks, etc.) into k partitions without regard to users or items.
Since record-based random cross-validation doesn’t make much sense with repeated interactions, this splitter only supports operating on the dataset’s interaction matrix.
- Parameters:
data (Dataset) – Ratings or other data you wish to partition.
partitions (int) – The number of partitions to produce.
rng_spec (RandomSeed | None) – The random number generator or seed (see
seedbank.numpy_rng()
).
- Returns:
an iterator of train-test pairs
- Return type:
iterator
- lenskit.splitting.records.sample_records(data: Dataset, size: int, *, disjoint: bool = True, rng_spec: RandomSeed | None = None, repeats: None = None) TTSplit #
- lenskit.splitting.records.sample_records(data: Dataset, size: int, *, repeats: int, disjoint: bool = True, rng_spec: RandomSeed | None = None) Iterator[TTSplit]
Sample train-test a frame of ratings into train-test partitions. This function does not care what kind of data is in data, so long as it is a Pandas DataFrame (or equivalent).
We can loop over a sequence of train-test pairs:
>>> from lenskit.data import load_movielens >>> movielens = load_movielens('data/ml-latest-small') >>> for train, test in sample_records(movielens, 1000, repeats=5): ... print(sum(len(il) for il in test.values())) 1000 1000 1000 1000 1000
Sometimes for testing, it is useful to just get a single pair:
>>> train, test = sample_records(movielens, 1000) >>> sum(len(il) for il in test.values()) 1000
- Parameters:
data – The data set to split.
size – The size of each test sample.
repeats – The number of data splits to produce. If
None
, produce a _single_ train-test pair instead of an iterator or list.disjoint – If
True
, force test samples to be disjoint.rng_spec – The random number generator or seed (see
seedbank.numpy_rng()
).
- Returns:
A train-test pair or iterator of such pairs (depending on
repeats
).