lenskit.splitting.users#

Functions

`crossfold_users`(data, partitions, method, *)	Partition a frame of ratings or other data into train-test partitions user-by-user.
`sample_users`()	Create train-test splits by sampling users.

lenskit.splitting.users.crossfold_users(data, partitions, method, *, rng_spec=None)#

Partition a frame of ratings or other data into train-test partitions user-by-user. This function does not care what kind of data is in data, so long as it is a Pandas DataFrame (or equivalent) and has a user column.

Parameters:

data (Dataset) – a data frame containing ratings or other data you wish to partition.
partitions (int) – the number of partitions to produce
method (HoldoutMethod) – The method for selecting test rows for each user.
rng_spec (RandomSeed | None) – The RNG or seed (see seedbank.numpy_rng()).

Return type:

Iterator[TTSplit]

Returns: The train-test pairs.

lenskit.splitting.users.sample_users(data: Dataset, size: int, method: HoldoutMethod, *, repeats: int, disjoint: bool = True, rng_spec: RandomSeed | None = None) → Iterator[TTSplit]#

lenskit.splitting.users.sample_users(data: Dataset, size: int, method: HoldoutMethod, *, disjoint: bool = True, rng_spec: RandomSeed | None = None, repeats: None = None) → TTSplit

Create train-test splits by sampling users. When repeats is None, returns a single train-test split; otherwise, it returns an iterator over multiple splits. If repeats=1, this function returns an iterator that yields a single train-test pair.

Parameters:

data – Data frame containing ratings or other data you wish to partition.
size – The sample size.
method – The method for obtaining user test ratings.
repeats – The number of samples to produce.
rng_spec – The RNG or seed (see seedbank.numpy_rng()).

Returns:

The train-test pair(s).