k-NN Collaborative Filtering

LKPY provides user- and item-based classical k-NN collaborative Filtering implementations. These lightly-configurable implementations are intended to capture the behavior of the Java-based LensKit implementations to provide a good upgrade path and enable basic experiments out of the box.

There are two different primary modes that you can use these algorithms in. When using explicit feedback (rating values), you usually want to use the defaults of weighted-average aggregation and mean-centering normalization. This is the default mode, and can be selected explicitly by passing feedback='explicit' to the class constructor.

With implicit feedback (unary data such as clicks and purchases, typically represented with rating values of 1 for positive items), the usual design is sum aggregation and no centering. This can be selected with feedback='implicit', which also configures the algorithm to ignore rating values (when present) and treat every rating as 1:

implicit_knn = ItemItem(20, feedback='implicit')

Attempting to center data on the same scale (all 1, for example) will typically produce invalid results. ItemKNN has diagnostics to warn you about this.

The feedback option only sets defaults; the algorithm can be further configured (e.g. to re-enable rating values) with additional parameters to the constructor.

New in version 0.14: The feedback option and the ability to ignore rating values was added in LensKit 0.14. In previous versions, you need to specifically configure each option.

Item-based k-NN

This is LensKit’s item-based k-NN model, based on the description by Deshpande and Karypis [DK04].

class lenskit.algorithms.item_knn.ItemItem(nnbrs, min_nbrs=1, min_sim=1e-06, save_nbrs=None, feedback='explicit', **kwargs)

Bases: Predictor

Item-item nearest-neighbor collaborative filtering with ratings. This item-item implementation is not terribly configurable; it hard-codes design decisions found to work well in the previous Java-based LensKit code [ELKR11]. This implementation is based on the description of item-based CF by Deshpande and Karypis [DK04], and produces results equivalent to Java LensKit.

The k-NN predictor supports several aggregate functions:

weighted-average: The weighted average of the user’s rating values, using item-item similarities as weights.
sum: The sum of the similarities between the target item and the user’s rated items, regardless of the rating the user gave the items.

Parameters:

nnbrs (int) – the maximum number of neighbors for scoring each item (None for unlimited)
min_nbrs (int) – the minimum number of neighbors for scoring each item
min_sim (float) – minimum similarity threshold for considering a neighbor
save_nbrs (float) – the number of neighbors to save per item in the trained model (None for unlimited)
feedback (str) –
Control how feedback should be interpreted. Specifies defaults for the other settings, which can be overridden individually; can be one of the following values:

explicit
Configure for explicit-feedback mode: use rating values, center ratings, and use the weighted-average aggregate method for prediction. This is the default setting.

implicit
Configure for implicit-feedback mode: ignore rating values, do not center ratings, and use the sum aggregate method for prediction.
center (bool) – whether to normalize (mean-center) rating vectors prior to computing similarities and aggregating user rating values. Defaults to True; turn this off when working with unary data and other data types that don’t respond well to centering.
aggregate (str) – the type of aggregation to do. Can be weighted-average (the default) or sum.
use_ratings (bool) – whether or not to use the rating values. If False, it ignores rating values and considers an implicit feedback signal of 1 for every (user,item) pair present.

item_index_

the index of item IDs.

Type:: pandas.Index

item_means_

the mean rating for each known item.

Type:: numpy.ndarray

item_counts_

the number of saved neighbors for each item.

Type:: numpy.ndarray

sim_matrix_

the similarity matrix.

Type:: matrix.CSR

user_index_

the index of known user IDs for the rating matrix.

Type:: pandas.Index

rating_matrix_

the user-item rating matrix for looking up users’ ratings.

Type:: matrix.CSR

IGNORED_PARAMS = ['feedback']: Names of parameters to ignore in get_params().

EXTRA_PARAMS = ['center', 'aggregate', 'use_ratings']: Names of extra parameters to include in get_params(). Useful when the constructor takes **kwargs.

fit(ratings, **kwargs)

Train a model.

The model-training process depends on save_nbrs and min_sim, but not on other algorithm parameters.

Parameters:: ratings (pandas.DataFrame) – (user,item,rating) data for computing item similarities.

predict_for_user(user, items, ratings=None)

Compute predictions for a user and items.

Parameters:

user – the user ID
items (array-like) – the items to predict
ratings (pandas.Series) – the user’s ratings (indexed by item id); if provided, they may be used to override or augment the model’s notion of a user’s preferences.

Returns:

scores for the items, indexed by item id.

Return type:

pandas.Series

User-based k-NN

class lenskit.algorithms.user_knn.UserUser(nnbrs, min_nbrs=1, min_sim=0, feedback='explicit', **kwargs)

Bases: Predictor

User-user nearest-neighbor collaborative filtering with ratings. This user-user implementation is not terribly configurable; it hard-codes design decisions found to work well in the previous Java-based LensKit code.

Parameters:

nnbrs (int) – the maximum number of neighbors for scoring each item (None for unlimited)
min_nbrs (int) – the minimum number of neighbors for scoring each item
min_sim (float) – minimum similarity threshold for considering a neighbor
feedback (str) –
Control how feedback should be interpreted. Specifies defaults for the other settings, which can be overridden individually; can be one of the following values:

explicit
Configure for explicit-feedback mode: use rating values, center ratings, and use the weighted-average aggregate method for prediction. This is the default setting.

implicit
Configure for implicit-feedback mode: ignore rating values, do not center ratings, and use the sum aggregate method for prediction.
center (bool) – whether to normalize (mean-center) rating vectors. Turn this off when working with unary data and other data types that don’t respond well to centering.
aggregate (str) – the type of aggregation to do. Can be weighted-average or sum.
use_ratings (bool) – whether or not to use rating values; default is True. If False, it ignores rating values and treates every present rating as 1.

user_index_

User index.

Type:: pandas.Index

item_index_

Item index.

Type:: pandas.Index

user_means_

User mean ratings.

Type:: numpy.ndarray

rating_matrix_

Normalized user-item rating matrix.

Type:: matrix.CSR

transpose_matrix_

Transposed un-normalized rating matrix.

Type:: matrix.CSR

IGNORED_PARAMS = ['feedback']: Names of parameters to ignore in get_params().

EXTRA_PARAMS = ['center', 'aggregate', 'use_ratings']: Names of extra parameters to include in get_params(). Useful when the constructor takes **kwargs.

fit(ratings, **kwargs)

“Train” a user-user CF model. This memorizes the rating data in a format that is usable for future computations.

Parameters:: ratings (pandas.DataFrame) – (user, item, rating) data for collaborative filtering.

predict_for_user(user, items, ratings=None)

Compute predictions for a user and items.

Parameters:

user – the user ID
items (array-like) – the items to predict
ratings (pandas.Series) – the user’s ratings (indexed by item id); if provided, will be used to recompute the user’s bias at prediction time.

Returns:

scores for the items, indexed by item id.

Return type:

pandas.Series