lenskit.splitting.records#

Functions

`crossfold_records`(data, partitions, *[, ...])	Partition a dataset by records into cross-fold partitions.
`sample_records`()	Sample train-test a frame of ratings into train-test partitions.

lenskit.splitting.records.crossfold_records(data, partitions, *, rng_spec=None)#

Partition a dataset by records into cross-fold partitions. This partitions the records (ratings, play counts, clicks, etc.) into k partitions without regard to users or items.

Since record-based random cross-validation doesn’t make much sense with repeated interactions, this splitter only supports operating on the dataset’s interaction matrix.

Parameters:

data (Dataset) – Ratings or other data you wish to partition.
partitions (int) – The number of partitions to produce.
rng_spec (RandomSeed | None) – The random number generator or seed (see seedbank.numpy_rng()).

Returns:

an iterator of train-test pairs

Return type:

iterator

lenskit.splitting.records.sample_records(data: Dataset, size: int, *, disjoint: bool = True, rng_spec: RandomSeed | None = None, repeats: None = None) → TTSplit#

lenskit.splitting.records.sample_records(data: Dataset, size: int, *, repeats: int, disjoint: bool = True, rng_spec: RandomSeed | None = None) → Iterator[TTSplit]

Sample train-test a frame of ratings into train-test partitions. This function does not care what kind of data is in data, so long as it is a Pandas DataFrame (or equivalent).

We can loop over a sequence of train-test pairs:

>>> from lenskit.data import load_movielens
>>> movielens = load_movielens('data/ml-latest-small')
>>> for train, test in sample_records(movielens, 1000, repeats=5):
...     print(sum(len(il) for il in test.values()))
1000
1000
1000
1000
1000

Sometimes for testing, it is useful to just get a single pair:

>>> train, test = sample_records(movielens, 1000)
>>> sum(len(il) for il in test.values())
1000

Parameters:

data – The data set to split.
size – The size of each test sample.
repeats – The number of data splits to produce. If None, produce a _single_ train-test pair instead of an iterator or list.
disjoint – If True, force test samples to be disjoint.
rng_spec – The random number generator or seed (see seedbank.numpy_rng()).

Returns:

A train-test pair or iterator of such pairs (depending on repeats).