kNN Collaborative Filtering¶
LKPY provides user and itembased classical kNN collaborative Filtering implementations. These lightlyconfigurable implementations are intended to capture the behavior of the Javabased LensKit implementations to provide a good upgrade path and enable basic experiments out of the box.
Itembased kNN¶

class
lenskit.algorithms.item_knn.
ItemItem
(nnbrs, min_nbrs=1, min_sim=1e06, save_nbrs=None, center=True, aggregate='weightedaverage')¶ Bases:
lenskit.algorithms.Predictor
Itemitem nearestneighbor collaborative filtering with ratings. This itemitem implementation is not terribly configurable; it hardcodes design decisions found to work well in the previous Javabased LensKit code.
 Parameters
nnbrs (int) – the maximum number of neighbors for scoring each item (
None
for unlimited)min_nbrs (int) – the minimum number of neighbors for scoring each item
min_sim (double) – minimum similarity threshold for considering a neighbor
save_nbrs (double) – the number of neighbors to save per item in the trained model (
None
for unlimited)center (bool) – whether to normalize (meancenter) rating vectors. Turn this off when working with unary data and other data types that don’t respond well to centering.
aggregate – the type of aggregation to do. Can be
weightedaverage
orsum
.

item_index_
¶ the index of item IDs.
 Type

item_means_
¶ the mean rating for each known item.
 Type

item_counts_
¶ the number of saved neighbors for each item.
 Type

sim_matrix_
¶ the similarity matrix.
 Type

user_index_
¶ the index of known user IDs for the rating matrix.
 Type

rating_matrix_
¶ the useritem rating matrix for looking up users’ ratings.
 Type

fit
(ratings)¶ Train a model.
The modeltraining process depends on
save_nbrs
andmin_sim
, but not on other algorithm parameters. Parameters
ratings (pandas.DataFrame) – (user,item,rating) data for computing item similarities.

predict_for_user
(user, items, ratings=None)¶ Compute predictions for a user and items.
 Parameters
user – the user ID
items (arraylike) – the items to predict
ratings (pandas.Series) – the user’s ratings (indexed by item id); if provided, they may be used to override or augment the model’s notion of a user’s preferences.
 Returns
scores for the items, indexed by item id.
 Return type
Userbased kNN¶

class
lenskit.algorithms.user_knn.
UserUser
(nnbrs, min_nbrs=1, min_sim=0, center=True, aggregate='weightedaverage')¶ Bases:
lenskit.algorithms.Predictor
Useruser nearestneighbor collaborative filtering with ratings. This useruser implementation is not terribly configurable; it hardcodes design decisions found to work well in the previous Javabased LensKit code.
 Parameters
nnbrs (int) – the maximum number of neighbors for scoring each item (
None
for unlimited)min_nbrs (int) – the minimum number of neighbors for scoring each item
min_sim (double) – minimum similarity threshold for considering a neighbor
center (bool) – whether to normalize (meancenter) rating vectors. Turn this off when working with unary data and other data types that don’t respond well to centering.
aggregate – the type of aggregation to do. Can be
weightedaverage
orsum
.

user_index_
¶ User index.
 Type

item_index_
¶ Item index.
 Type

user_means_
¶ User mean ratings.
 Type

rating_matrix_
¶ Normalized useritem rating matrix.
 Type

transpose_matrix_
¶ Transposed unnormalized rating matrix.
 Type

fit
(ratings)¶ “Train” a useruser CF model. This memorizes the rating data in a format that is usable for future computations.
 Parameters
ratings (pandas.DataFrame) – (user, item, rating) data for collaborative filtering.
 Returns
a memorized model for efficient userbased CF computation.
 Return type
UUModel

predict_for_user
(user, items, ratings=None)¶ Compute predictions for a user and items.
 Parameters
user – the user ID
items (arraylike) – the items to predict
ratings (pandas.Series) – the user’s ratings (indexed by item id); if provided, will be used to recompute the user’s bias at prediction time.
 Returns
scores for the items, indexed by item id.
 Return type