lenskit.algorithms.knn#

k-NN recommender models.

class lenskit.algorithms.knn.ItemItem(nnbrs, min_nbrs=1, min_sim=1e-06, save_nbrs=None, feedback='explicit', block_size=250, **kwargs)#

Bases: Predictor

Item-item nearest-neighbor collaborative filtering with ratings. This item-item implementation is not terribly configurable; it hard-codes design decisions found to work well in the previous Java-based LensKit code [ELKR11]. This implementation is based on the description of item-based CF by Deshpande and Karypis [DK04], and produces results equivalent to Java LensKit.

The k-NN predictor supports several aggregate functions:

weighted-average

The weighted average of the user’s rating values, using item-item similarities as weights.

sum

The sum of the similarities between the target item and the user’s rated items, regardless of the rating the user gave the items.

Parameters:
  • nnbrs (int) – the maximum number of neighbors for scoring each item (None for unlimited)

  • min_nbrs (int) – the minimum number of neighbors for scoring each item

  • min_sim (float) – Minimum similarity threshold for considering a neighbor. Must be positive; if less than the smallest 32-bit normal (\(1.175 \times 10^{-38}\)), is clamped to that value.

  • save_nbrs (int | None) – the number of neighbors to save per item in the trained model (None for unlimited)

  • feedback (Literal['explicit', 'implicit']) –

    Control how feedback should be interpreted. Specifies defaults for the other settings, which can be overridden individually; can be one of the following values:

    explicit

    Configure for explicit-feedback mode: use rating values, center ratings, and use the weighted-average aggregate method for prediction. This is the default setting.

    implicit

    Configure for implicit-feedback mode: ignore rating values, do not center ratings, and use the sum aggregate method for prediction.

  • center – whether to normalize (mean-center) rating vectors prior to computing similarities and aggregating user rating values. Defaults to True; turn this off when working with unary data and other data types that don’t respond well to centering.

  • aggregate – the type of aggregation to do. Can be weighted-average (the default) or sum.

  • use_ratings – whether or not to use the rating values. If False, it ignores rating values and considers an implicit feedback signal of 1 for every (user,item) pair present.

  • block_size (int)

IGNORED_PARAMS = ['feedback']#

Names of parameters to ignore in get_params().

EXTRA_PARAMS = ['center', 'aggregate', 'use_ratings']#

Names of extra parameters to include in get_params(). Useful when the constructor takes **kwargs.

items_: Vocabulary#

Vocabulary of item IDs.

item_means_: Tensor | None#

Mean rating for each known item.

item_counts_: Tensor#

Number of saved neighbors for each item.

sim_matrix_: Tensor#

Similarity matrix (sparse CSR tensor).

users_: Vocabulary#

Vocabulary of user IDs.

rating_matrix_: Tensor#

Normalized rating matrix to look up user ratings at prediction time.

fit(data, **kwargs)#

Train a model.

The model-training process depends on save_nbrs and min_sim, but not on other algorithm parameters.

Parameters:
  • ratings – (user,item,rating) data for computing item similarities.

  • data (Dataset)

predict_for_user(user, items, ratings=None)#

Compute predictions for a user and items.

Parameters:
  • user – the user ID

  • items (array-like) – the items to predict

  • ratings (pandas.Series) – the user’s ratings (indexed by item id); if provided, they may be used to override or augment the model’s notion of a user’s preferences.

Returns:

scores for the items, indexed by item id.

Return type:

pandas.Series

class lenskit.algorithms.knn.UserUser(nnbrs, min_nbrs=1, min_sim=1e-06, feedback='explicit', **kwargs)#

Bases: Predictor

User-user nearest-neighbor collaborative filtering with ratings. This user-user implementation is not terribly configurable; it hard-codes design decisions found to work well in the previous Java-based LensKit code.

Parameters:
  • nnbrs (int) – the maximum number of neighbors for scoring each item (None for unlimited).

  • min_nbrs (int) – The minimum number of neighbors for scoring each item.

  • min_sim (float) – Minimum similarity threshold for considering a neighbor. Must be positive; if less than the smallest 32-bit normal (\(1.175 \times 10^{-38}\)), is clamped to that value.

  • feedback (Literal['explicit', 'implicit']) –

    Control how feedback should be interpreted. Specifies defaults for the other settings, which can be overridden individually; can be one of the following values:

    explicit

    Configure for explicit-feedback mode: use rating values, center ratings, and use the weighted-average aggregate method for prediction. This is the default setting.

    implicit

    Configure for implicit-feedback mode: ignore rating values, do not center ratings, and use the sum aggregate method for prediction.

  • center – whether to normalize (mean-center) rating vectors. Turn this off when working with unary data and other data types that don’t respond well to centering.

  • aggregate – the type of aggregation to do. Can be weighted-average or sum.

  • use_ratings – whether or not to use rating values; default is True. If False, it ignores rating values and treates every present rating as 1.

IGNORED_PARAMS = ['feedback']#

Names of parameters to ignore in get_params().

EXTRA_PARAMS = ['center', 'aggregate', 'use_ratings']#

Names of extra parameters to include in get_params(). Useful when the constructor takes **kwargs.

users_: Vocabulary#

The index of user IDs.

items_: Vocabulary#

The index of item IDs.

user_means_: Tensor | None#

Mean rating for each known user.

user_vectors_: Tensor#

Normalized rating matrix (CSR) to find neighbors at prediction time.

user_ratings_: Tensor#

Centered but un-normalized rating matrix (COO) to find neighbor ratings.

fit(data, **kwargs)#

“Train” a user-user CF model. This memorizes the rating data in a format that is usable for future computations.

Parameters:
Return type:

Self

predict_for_user(user, items, ratings=None)#

Compute predictions for a user and items.

Parameters:
  • user – the user ID

  • items (array-like) – the items to predict

  • ratings (pandas.Series) – the user’s ratings (indexed by item id); if provided, will be used to recompute the user’s bias at prediction time.

Returns:

scores for the items, indexed by item id.

Return type:

pandas.Series

Modules

item

Item-based k-NN collaborative filtering.

user

User-based k-NN collaborative filtering.