lenskit.basic#

Basic and baseline pipeline components.

class lenskit.basic.BiasModel(damping, global_bias, items=None, item_biases=None, users=None, user_biases=None)#

Bases: object

User-item bias models learned from rating data. The BiasScorer class uses this model to score items in a pipeline; the model is reusable in other components that need user-item bias models.

This implements the following model:

\[b_{ui} = b_g + b_i + b_u\]

where \(b_g\) is the global bias (global mean rating), \(b_i\) is item bias, and \(b_u\) is the user bias. With the provided damping values \(\beta_{\mathrm{u}}\) and \(\beta_{\mathrm{i}}\), they are computed as follows:

\[\begin{align*} b_g & = \frac{\sum_{r_{ui} \in R} r_{ui}}{|R|} & b_i & = \frac{\sum_{r_{ui} \in R_i} (r_{ui} - b_g)}{|R_i| + \beta_{\mathrm{i}}} & b_u & = \frac{\sum_{r_{ui} \in R_u} (r_{ui} - b_g - b_i)}{|R_u| + \beta_{\mathrm{u}}} \end{align*}\]

The damping values can be interpreted as the number of default (mean) ratings to assume a priori for each user or item, damping low-information users and items towards a mean instead of permitting them to take on extreme values based on few ratings.

Parameters:
damping: UITuple[float]#

The mean damping terms.

global_bias: float#

The global bias term.

items: Vocabulary | None = None#

Vocabulary of items.

item_biases: ndarray[int, dtype[float32]] | None = None#

The item offsets (\(b_i\) values).

users: Vocabulary | None = None#

Vocabulary of users.

user_biases: ndarray[int, dtype[float32]] | None = None#

The user offsets (\(b_u\) values).

classmethod learn(data, damping=0.0, *, items=True, users=True)#

Learn a bias model and its parameters from a dataset.

Parameters:
  • data (Dataset) – The dataset from which to learn the bias model.

  • damping (float | UITuple[float] | tuple[float, float]) – Bayesian damping to apply to computed biases. Either a number, to damp both user and item biases the same amount, or a (user,item) tuple providing separate damping values.

  • items (bool) – Whether to compute item biases

  • users (bool) – Whether to compute user biases

Return type:

Self

compute_for_items(items, user_id=None, user_items=None, *, bias=None)#

Compute the personalized biases for a set of itemsm and optionally a user. The user can be specified either by their identifier or by a list of ratings.

Parameters:
  • items (ItemList) – The items to score.

  • user – The user identifier.

  • user_items (ItemList | None) – The user’s items, with ratings (takes precedence over user if both are supplied). If the supplied list does not have a rating field, it is ignored.

  • bias (float | None) – A pre-computed user bias.

  • user_id (int | str | bytes | integer[Any] | str_ | bytes_ | object_ | None)

Returns:

A tuple of the overall bias scores for the specified items and user, and the user’s bias (needed to de-normalize scores efficiently later). If a user bias is provided instead of user information, only the composite bias scores are returned.

transform_matrix(matrix)#

Transform a sparse ratings matrix by subtracting biases.

Parameters:

matrix (Tensor)

class lenskit.basic.BiasScorer(items=True, users=True, damping=0.0, *, user_damping=None, item_damping=None)#

Bases: Component

A user-item bias rating prediction model. This component uses BiasModel to predict ratings for users and items.

Parameters:
  • items (bool) – Whether to compute item biases.

  • users (bool) – Whether to compute user biases.

  • damping (UITuple[float]) – Bayesian damping to apply to computed biases. Either a number, to damp both user and item biases the same amount, or a (user,item) tuple providing separate damping values.

  • user_damping (float | None)

  • item_damping (float | None)

IGNORED_CONFIG_FIELDS: ClassVar[list[str]] = ['user_damping', 'item_damping']#

Names of constructor parameters that should be excluded from the configuration dictionary.

damping: UITuple[float]#

The configured offset damping levels.

train(data)#

Train the bias model on some rating data.

Parameters:
  • ratings – The training data (must have ratings).

  • data (Dataset)

Returns:

The trained bias object.

class lenskit.basic.PopScorer(score_method='quantile')#

Bases: Component, Trainable

Score items by their popularity. Use with TopN to get a most-popular-items recommender.

Parameters:
  • score_type

    The method for computing popularity scores. Can be one of the following:

    • 'quantile' (the default)

    • 'rank'

    • 'count'

  • score_method (str)

item_pop_#

Item popularity scores.

property is_trained: bool#

Check if this model has already been trained.

train(data)#

Train the pipeline component to learn its parameters from a training dataset.

Parameters:
  • data (Dataset) – The training dataset.

  • retrain – If True, retrain the model even if it has already been trained.

class lenskit.basic.TopNRanker(n=-1)#

Bases: Component

Rank scored items by their score and take the top N. The ranking length can be passed either at runtime or at component instantiation time, with the latter taking precedence.

Parameters:

n (int) – The desired ranking length. If negative, then scored items are ranked but the ranking is not truncated.

class lenskit.basic.RandomSelector(n=-1, rng=None)#

Bases: Component

Randomly select items from a candidate list.

Parameters:
  • n (int) – The number of items to select, or -1 to randomly permute the items.

  • rng (int | integer[Any] | Sequence[int] | SeedSequence | Literal['user'] | tuple[int | ~numpy.integer[~typing.Any] | ~typing.Sequence[int] | ~numpy.random.bit_generator.SeedSequence, ~typing.Literal['user']] | None) – The random number generator or specification (see Random Seeds). This class supports derivable RNGs.

class lenskit.basic.SoftmaxRanker(n=-1, rng=None)#

Bases: Component

Stochastic top-N ranking with softmax sampling.

This uses the “softmax” sampling method, a more efficient approximation of Plackett-Luce sampling than even the Gumbell trick, as documented by `Tim Vieira`_. It expects a scored list of input items, and samples n items, with selection probabilities proportional to their scores.

Note

Negative scores are clamped to (approximately) zero.

Parameters:
  • n (int) – The number of items to return (-1 to return unlimited).

  • rng (int | integer[Any] | Sequence[int] | SeedSequence | Literal['user'] | tuple[int | ~numpy.integer[~typing.Any] | ~typing.Sequence[int] | ~numpy.random.bit_generator.SeedSequence, ~typing.Literal['user']] | None) – The random number generator or specification (see Random Seeds). This class supports derivable RNGs.

class lenskit.basic.UserTrainingHistoryLookup(*args, **kwargs)#

Bases: Component, Trainable

Look up a user’s history from the training data.

property is_trained: bool#

Check if this model has already been trained.

train(data)#

Train the pipeline component to learn its parameters from a training dataset.

Parameters:
  • data (Dataset) – The training dataset.

  • retrain – If True, retrain the model even if it has already been trained.

class lenskit.basic.UnratedTrainingItemsCandidateSelector(*args, **kwargs)#

Bases: TrainingCandidateSelectorBase

Candidate selector that selects all known items from the training data that do not appear in the request user’s history (RecQuery.user_items). If no item history is available, then all training items are returned.

In order to look up the user’s history in the training data, this needs to be combined with a component like UserTrainingHistoryLookup.

class lenskit.basic.AllTrainingItemsCandidateSelector(*args, **kwargs)#

Bases: TrainingCandidateSelectorBase

Candidate selector that selects all known items from the training data.

class lenskit.basic.FallbackScorer(*args, **kwargs)#

Bases: Component

Scoring component that fills in missing scores using a fallback.

Modules

bias

Bias scoring model.

candidates

composite

history

Components that look up user history from the training data.

popularity

random

topn

Basic Top-N ranking.