lenskit.splitting.users#

Functions

crossfold_users(data, partitions, method, *)

Partition a frame of ratings or other data into train-test partitions user-by-user.

sample_users()

Create train-test splits by sampling users.

lenskit.splitting.users.crossfold_users(data, partitions, method, *, rng_spec=None)#

Partition a frame of ratings or other data into train-test partitions user-by-user. This function does not care what kind of data is in data, so long as it is a Pandas DataFrame (or equivalent) and has a user column.

Parameters:
  • data (Dataset) – a data frame containing ratings or other data you wish to partition.

  • partitions (int) – the number of partitions to produce

  • method (HoldoutMethod) – The method for selecting test rows for each user.

  • rng_spec (RandomSeed | None) – The RNG or seed (see seedbank.numpy_rng()).

Return type:

Iterator[TTSplit]

Returns

The train-test pairs.

lenskit.splitting.users.sample_users(data: Dataset, size: int, method: HoldoutMethod, *, repeats: int, disjoint: bool = True, rng_spec: RandomSeed | None = None) Iterator[TTSplit]#
lenskit.splitting.users.sample_users(data: Dataset, size: int, method: HoldoutMethod, *, disjoint: bool = True, rng_spec: RandomSeed | None = None, repeats: None = None) TTSplit

Create train-test splits by sampling users. When repeats is None, returns a single train-test split; otherwise, it returns an iterator over multiple splits. If repeats=1, this function returns an iterator that yields a single train-test pair.

Parameters:
  • data – Data frame containing ratings or other data you wish to partition.

  • size – The sample size.

  • method – The method for obtaining user test ratings.

  • repeats – The number of samples to produce.

  • rng_spec – The RNG or seed (see seedbank.numpy_rng()).

Returns:

The train-test pair(s).