k-NN Collaborative Filtering

LKPY provides user- and item-based classical k-NN collaborative Filtering implementations. These lightly-configurable implementations are intended to capture the behavior of the Java-based LensKit implementations to provide a good upgrade path and enable basic experiments out of the box.

Item-based k-NN

class lenskit.algorithms.item_knn.ItemItem(nnbrs, min_nbrs=1, min_sim=1e-06, save_nbrs=None, center=True, aggregate='weighted-average')

Bases: lenskit.algorithms.Trainable, lenskit.algorithms.Predictor

Item-item nearest-neighbor collaborative filtering with ratings. This item-item implementation is not terribly configurable; it hard-codes design decisions found to work well in the previous Java-based LensKit code.

load_model(path)

Save a trained model to a file.

Parameters:path (str) – the path to file from which to load the model.
Returns:the re-loaded model (of an implementation-defined type).
predict(model, user, items, ratings=None)

Compute predictions for a user and items.

Parameters:
  • model – the trained model to use. Either None or the ratings matrix if the algorithm has no concept of training.
  • user – the user ID
  • items (array-like) – the items to predict
  • ratings (pandas.Series) – the user’s ratings (indexed by item id); if provided, they may be used to override or augment the model’s notion of a user’s preferences.
Returns:

scores for the items, indexed by item id.

Return type:

pandas.Series

save_model(model, path)

Save a trained model to a file or directory. The default implementation pickles the model.

Algorithms are allowed to use any format for saving their models, including directories.

Parameters:
  • model – the trained model.
  • path (str) – the path at which to save the model.
train(ratings)

Train a model.

The model-training process depends on save_nbrs and min_sim, but not on other algorithm parameters.

Parameters:ratings (pandas.DataFrame) – (user,item,rating) data for computing item similarities.
Returns:a trained item-item CF model.
class lenskit.algorithms.item_knn.IIModel

Item-item recommendation model. This stores the necessary data to run the item-based k-NN recommender.

items

the index of item IDs.

Type:pandas.Index
means

the mean rating for each known item.

Type:numpy.ndarray
counts

the number of saved neighbors for each item.

Type:numpy.ndarray
sim_matrix

the similarity matrix.

Type:matrix.CSR
users

the index of known user IDs for the rating matrix.

Type:pandas.Index
rating_matrix

the user-item rating matrix for looking up users’ ratings.

Type:matrix.CSR

User-based k-NN

class lenskit.algorithms.user_knn.UserUser(nnbrs, min_nbrs=1, min_sim=0, center=True, aggregate='weighted-average')

Bases: lenskit.algorithms.Trainable, lenskit.algorithms.Predictor

User-user nearest-neighbor collaborative filtering with ratings. This user-user implementation is not terribly configurable; it hard-codes design decisions found to work well in the previous Java-based LensKit code.

load_model(path)

Save a trained model to a file.

Parameters:path (str) – the path to file from which to load the model.
Returns:the re-loaded model (of an implementation-defined type).
predict(model, user, items, ratings=None)

Compute predictions for a user and items.

Parameters:
  • model (UUModel) – the memorized data to use.
  • user – the user ID
  • items (array-like) – the items to predict
  • ratings (pandas.Series) – the user’s ratings (indexed by item id); if provided, will be used to recompute the user’s bias at prediction time.
Returns:

scores for the items, indexed by item id.

Return type:

pandas.Series

save_model(model, path)

Save a trained model to a file or directory. The default implementation pickles the model.

Algorithms are allowed to use any format for saving their models, including directories.

Parameters:
  • model – the trained model.
  • path (str) – the path at which to save the model.
train(ratings)

“Train” a user-user CF model. This memorizes the rating data in a format that is usable for future computations.

Parameters:ratings (pandas.DataFrame) – (user, item, rating) data for collaborative filtering.
Returns:a memorized model for efficient user-based CF computation.
Return type:UUModel
class lenskit.algorithms.user_knn.UUModel

Memorized data for user-user collaborative filtering.

matrix

normalized user-item rating matrix.

Type:matrix.CSR
users

index of user IDs.

Type:pandas.Index
user_means

user mean ratings.

Type:numpy.ndarray
items

index of item IDs.

Type:pandas.Index
transpose

the transposed rating matrix (with data transformations but without L2 normalization).

Type:matrix.CSR