k-NN Collaborative Filtering#
LKPY provides user- and item-based classical k-NN collaborative Filtering implementations. These lightly-configurable implementations are intended to capture the behavior of the Java-based LensKit implementations to provide a good upgrade path and enable basic experiments out of the box.
There are two different primary modes that you can use these algorithms in. When using explicit
feedback (rating values), you usually want to use the defaults of weighted-average aggregation and
mean-centering normalization. This is the default mode, and can be selected explicitly by passing
feedback='explicit'
to the class constructor.
With implicit feedback (unary data such as clicks and purchases, typically represented with
rating values of 1 for positive items), the usual design is sum aggregation and no centering. This
can be selected with feedback='implicit'
, which also configures the algorithm to ignore rating
values (when present) and treat every rating as 1:
implicit_knn = ItemItem(20, feedback='implicit')
Attempting to center data on the same scale (all 1, for example) will typically produce invalid results. ItemKNN has diagnostics to warn you about this.
The feedback
option only sets defaults; the algorithm can be further configured (e.g. to re-enable
rating values) with additional parameters to the constructor.
Added in version 0.14: The feedback
option and the ability to ignore rating values was added in LensKit 0.14.
In previous versions, you need to specifically configure each option.
Item-based k-NN#
This is LensKit’s item-based k-NN model, based on the description by Deshpande and Karypis [DK04].
- class lenskit.algorithms.knn.ItemItem(nnbrs, min_nbrs=1, min_sim=1e-06, save_nbrs=None, feedback='explicit', block_size=250, **kwargs)#
Bases:
Predictor
Item-item nearest-neighbor collaborative filtering with ratings. This item-item implementation is not terribly configurable; it hard-codes design decisions found to work well in the previous Java-based LensKit code [ELKR11]. This implementation is based on the description of item-based CF by Deshpande and Karypis [DK04], and produces results equivalent to Java LensKit.
The k-NN predictor supports several aggregate functions:
weighted-average
The weighted average of the user’s rating values, using item-item similarities as weights.
sum
The sum of the similarities between the target item and the user’s rated items, regardless of the rating the user gave the items.
- Parameters:
nnbrs (int) – the maximum number of neighbors for scoring each item (
None
for unlimited)min_nbrs (int) – the minimum number of neighbors for scoring each item
min_sim (float) – Minimum similarity threshold for considering a neighbor. Must be positive; if less than the smallest 32-bit normal (\(1.175 \times 10^{-38}\)), is clamped to that value.
save_nbrs (int | None) – the number of neighbors to save per item in the trained model (
None
for unlimited)feedback (Literal['explicit', 'implicit']) –
Control how feedback should be interpreted. Specifies defaults for the other settings, which can be overridden individually; can be one of the following values:
explicit
Configure for explicit-feedback mode: use rating values, center ratings, and use the
weighted-average
aggregate method for prediction. This is the default setting.implicit
Configure for implicit-feedback mode: ignore rating values, do not center ratings, and use the
sum
aggregate method for prediction.
center – whether to normalize (mean-center) rating vectors prior to computing similarities and aggregating user rating values. Defaults to
True
; turn this off when working with unary data and other data types that don’t respond well to centering.aggregate – the type of aggregation to do. Can be
weighted-average
(the default) orsum
.use_ratings – whether or not to use the rating values. If
False
, it ignores rating values and considers an implicit feedback signal of 1 for every (user,item) pair present.block_size (int)
- IGNORED_PARAMS = ['feedback']#
Names of parameters to ignore in
get_params()
.
- EXTRA_PARAMS = ['center', 'aggregate', 'use_ratings']#
Names of extra parameters to include in
get_params()
. Useful when the constructor takes**kwargs
.
- fit(ratings, **kwargs)#
Train a model.
The model-training process depends on
save_nbrs
andmin_sim
, but not on other algorithm parameters.- Parameters:
ratings (DataFrame) – (user,item,rating) data for computing item similarities.
- predict_for_user(user, items, ratings=None)#
Compute predictions for a user and items.
- Parameters:
user – the user ID
items (array-like) – the items to predict
ratings (pandas.Series) – the user’s ratings (indexed by item id); if provided, they may be used to override or augment the model’s notion of a user’s preferences.
- Returns:
scores for the items, indexed by item id.
- Return type:
User-based k-NN#
- class lenskit.algorithms.knn.UserUser(nnbrs, min_nbrs=1, min_sim=1e-06, feedback='explicit', **kwargs)#
Bases:
Predictor
User-user nearest-neighbor collaborative filtering with ratings. This user-user implementation is not terribly configurable; it hard-codes design decisions found to work well in the previous Java-based LensKit code.
- Parameters:
nnbrs (int) – the maximum number of neighbors for scoring each item (
None
for unlimited).min_nbrs (int) – The minimum number of neighbors for scoring each item.
min_sim (float) – Minimum similarity threshold for considering a neighbor. Must be positive; if less than the smallest 32-bit normal (\(1.175 \times 10^{-38}\)), is clamped to that value.
feedback (FeedbackType) –
Control how feedback should be interpreted. Specifies defaults for the other settings, which can be overridden individually; can be one of the following values:
explicit
Configure for explicit-feedback mode: use rating values, center ratings, and use the
weighted-average
aggregate method for prediction. This is the default setting.implicit
Configure for implicit-feedback mode: ignore rating values, do not center ratings, and use the
sum
aggregate method for prediction.
center – whether to normalize (mean-center) rating vectors. Turn this off when working with unary data and other data types that don’t respond well to centering.
aggregate – the type of aggregation to do. Can be
weighted-average
orsum
.use_ratings – whether or not to use rating values; default is
True
. IfFalse
, it ignores rating values and treates every present rating as 1.
- IGNORED_PARAMS = ['feedback']#
Names of parameters to ignore in
get_params()
.
- EXTRA_PARAMS = ['center', 'aggregate', 'use_ratings']#
Names of extra parameters to include in
get_params()
. Useful when the constructor takes**kwargs
.
- user_index_: pd.Index[Any]#
The index of user IDs.
- item_index_: pd.Index[Any]#
The index of item IDs.
- user_means_: torch.Tensor | None#
Mean rating for each known user.
- user_vectors_: torch.Tensor#
Normalized rating matrix (CSR) to find neighbors at prediction time.
- user_ratings_: torch.Tensor#
Centered but un-normalized rating matrix (COO) to find neighbor ratings.
- fit(ratings, **kwargs)#
“Train” a user-user CF model. This memorizes the rating data in a format that is usable for future computations.
- Parameters:
ratings (pandas.DataFrame) – (user, item, rating) data for collaborative filtering.
- Return type:
- predict_for_user(user, items, ratings=None)#
Compute predictions for a user and items.
- Parameters:
user – the user ID
items (array-like) – the items to predict
ratings (pandas.Series) – the user’s ratings (indexed by item id); if provided, will be used to recompute the user’s bias at prediction time.
- Returns:
scores for the items, indexed by item id.
- Return type: