lenskit.splitting.sample_records#
- lenskit.splitting.sample_records(data: Dataset, size: int, *, disjoint: bool = True, test_only: bool = False, rng: RNGInput = None, repeats: None = None) TTSplit #
- lenskit.splitting.sample_records(data: Dataset, size: int, *, repeats: int, disjoint: bool = True, test_only: bool = False, rng: RNGInput = None) Iterator[TTSplit]
Sample train-test a frame of ratings into train-test partitions. This function does not care what kind of data is in data, so long as it is a Pandas DataFrame (or equivalent).
We can loop over a sequence of train-test pairs:
>>> from lenskit.data import load_movielens >>> movielens = load_movielens('data/ml-latest-small') >>> for split in sample_records(movielens, 1000, repeats=5): ... print(sum(len(il) for il in split.test.lists())) 1000 1000 1000 1000 1000
Sometimes for testing, it is useful to just get a single pair:
>>> split = sample_records(movielens, 1000) >>> sum(len(il) for il in split.test.lists()) 1000
- Parameters:
data – The data set to split.
size – The size of each test sample.
repeats – The number of data splits to produce. If
None
, produce a _single_ train-test pair instead of an iterator or list.disjoint – If
True
, force test samples to be disjoint.test_only – If
True
, returns splits with empty training sets (useful when you just want to save the test data).rng – The random number generator or seed (see Random Seeds).
- Returns:
A train-test pair or iterator of such pairs (depending on
repeats
).