lenskit.splitting.users#
Functions
|
Partition a frame of ratings or other data into train-test partitions user-by-user. |
Create train-test splits by sampling users. |
- lenskit.splitting.users.crossfold_users(data, partitions, method, *, rng_spec=None)#
Partition a frame of ratings or other data into train-test partitions user-by-user. This function does not care what kind of data is in data, so long as it is a Pandas DataFrame (or equivalent) and has a user column.
- Parameters:
data (Dataset) – a data frame containing ratings or other data you wish to partition.
partitions (int) – the number of partitions to produce
method (HoldoutMethod) – The method for selecting test rows for each user.
rng_spec (RandomSeed | None) – The RNG or seed (see
seedbank.numpy_rng()
).
- Return type:
Iterator[TTSplit]
- Returns
The train-test pairs.
- lenskit.splitting.users.sample_users(data: Dataset, size: int, method: HoldoutMethod, *, repeats: int, disjoint: bool = True, rng_spec: RandomSeed | None = None) Iterator[TTSplit] #
- lenskit.splitting.users.sample_users(data: Dataset, size: int, method: HoldoutMethod, *, disjoint: bool = True, rng_spec: RandomSeed | None = None, repeats: None = None) TTSplit
Create train-test splits by sampling users. When
repeats
is None, returns a single train-test split; otherwise, it returns an iterator over multiple splits. Ifrepeats=1
, this function returns an iterator that yields a single train-test pair.- Parameters:
data – Data frame containing ratings or other data you wish to partition.
size – The sample size.
method – The method for obtaining user test ratings.
repeats – The number of samples to produce.
rng_spec – The RNG or seed (see
seedbank.numpy_rng()
).
- Returns:
The train-test pair(s).