Classic Matrix Factorization

LKPY provides classical matrix factorization implementations.

Common Support

The mf_common module contains common support code for matrix factorization algorithms.

class lenskit.algorithms.mf_common.MFModel(users, items, umat, imat)

Common model for matrix factorization.

user_index

Users in the model (length=:math:m).

Type:pandas.Index
item_index

Items in the model (length=:math:n).

Type:pandas.Index
user_features

The \(m \times k\) user-feature matrix.

Type:numpy.ndarray
item_features

The \(n \times k\) item-feature matrix.

Type:numpy.ndarray
lookup_items(items)

Look up the indices for a set of items.

Parameters:items (array-like) – the item IDs to look up.
Returns:the item indices. Unknown items will have negative indices.
Return type:numpy.ndarray
lookup_user(user)

Look up the index for a user.

Parameters:user – the user ID to look up
Returns:the user index.
Return type:int
n_features

The number of features.

n_items

The number of items.

n_users

The number of users.

score(user, items)

Score a set of items for a user. User and item parameters must be indices into the matrices.

Parameters:
  • user (int) – the user index
  • items (array-like of int) – the item indices
  • raw (bool) – if True, do return raw scores without biases added back.
Returns:

the scores for the items.

Return type:

numpy.ndarray

class lenskit.algorithms.mf_common.BiasMFModel(users, items, bias, umat, imat)

Common model for biased matrix factorization.

user_index

Users in the model (length=:math:m).

Type:pandas.Index
item_index

Items in the model (length=:math:n).

Type:pandas.Index
global_bias

The global bias term.

Type:double
user_bias

The user bias terms.

Type:numpy.ndarray
item_bias

The item bias terms.

Type:numpy.ndarray
user_features

The \(m \times k\) user-feature matrix.

Type:numpy.ndarray
item_features

The \(n \times k\) item-feature matrix.

Type:numpy.ndarray
score(user, items, raw=False)

Score a set of items for a user. User and item parameters must be indices into the matrices.

Parameters:
  • user (int) – the user index
  • items (array-like of int) – the item indices
  • raw (bool) – if True, do return raw scores without biases added back.
Returns:

the scores for the items.

Return type:

numpy.ndarray

Alternating Least Squares

LensKit provides alternating least squares implementations of matrix factorization suitable for explicit feedback data. These implementations are parallelized with Numba, and perform best with the MKL from Conda.

class lenskit.algorithms.als.BiasedMF(features, iterations=20, reg=0.1, damping=5)

Biased matrix factorization trained with alternating least squares [ZWSP2008]. This is a prediction-oriented algorithm suitable for explicit feedback data.

[ZWSP2008]Yunhong Zhou, Dennis Wilkinson, Robert Schreiber, and Rong Pan. 2008. Large-Scale Parallel Collaborative Filtering for the Netflix Prize. In +Algorithmic Aspects in Information and Management_, LNCS 5034, 337–348. DOI 10.1007/978-3-540-68880-8_32.
Parameters:
  • features (int) – the number of features to train
  • iterations (int) – the number of iterations to train
  • reg (double) – the regularization factor
  • damping (double) – damping factor for the underlying mean
features

the number of features.

Type:int
iterations

the number of training iterations.

Type:int
regularization

the regularization factor.

Type:double
damping

the mean damping.

Type:double
load_model(path)

Save a trained model to a file.

Parameters:path (str) – the path to file from which to load the model.
Returns:the re-loaded model (of an implementation-defined type).
predict(model: lenskit.algorithms.mf_common.BiasMFModel, user, items, ratings=None)

Compute predictions for a user and items.

Parameters:
  • model – the trained model to use. Either None or the ratings matrix if the algorithm has no concept of training.
  • user – the user ID
  • items (array-like) – the items to predict
  • ratings (pandas.Series) – the user’s ratings (indexed by item id); if provided, they may be used to override or augment the model’s notion of a user’s preferences.
Returns:

scores for the items, indexed by item id.

Return type:

pandas.Series

save_model(model, path)

Save a trained model to a file or directory. The default implementation pickles the model.

Algorithms are allowed to use any format for saving their models, including directories.

Parameters:
  • model – the trained model.
  • path (str) – the path at which to save the model.
train(ratings, bias=None)

Run ALS to train a model.

Parameters:
  • ratings – the ratings data frame.
  • bias (bias.BiasModel) – a pre-trained bias model to use.
Returns:

The trained biased MF model.

Return type:

BiasMFModel

class lenskit.algorithms.als.ImplicitMF(features, iterations=20, reg=0.1, weight=40)

Implicit matrix factorization trained with alternating least squares [HKV2008]. This algorithm outputs ‘predictions’, but they are not on a meaningful scale. If its input data contains rating values, these will be used as the ‘confidence’ values; otherwise, confidence will be 1 for every rated item.

[HKV2008](1, 2) Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In _Proceedings of the 2008 Eighth IEEE International Conference on Data Mining_, 263–272. DOI 10.1109/ICDM.2008.22
Parameters:
  • features (int) – the number of features to train
  • iterations (int) – the number of iterations to train
  • reg (double) – the regularization factor
  • weight (double) – the scaling weight for positive samples (\(\alpha\) in [HKV2008]).
load_model(path)

Save a trained model to a file.

Parameters:path (str) – the path to file from which to load the model.
Returns:the re-loaded model (of an implementation-defined type).
predict(model: lenskit.algorithms.mf_common.MFModel, user, items, ratings=None)

Compute predictions for a user and items.

Parameters:
  • model – the trained model to use. Either None or the ratings matrix if the algorithm has no concept of training.
  • user – the user ID
  • items (array-like) – the items to predict
  • ratings (pandas.Series) – the user’s ratings (indexed by item id); if provided, they may be used to override or augment the model’s notion of a user’s preferences.
Returns:

scores for the items, indexed by item id.

Return type:

pandas.Series

save_model(model, path)

Save a trained model to a file or directory. The default implementation pickles the model.

Algorithms are allowed to use any format for saving their models, including directories.

Parameters:
  • model – the trained model.
  • path (str) – the path at which to save the model.
train(ratings)

Train the model on rating/consumption data. Training methods that require additional data may accept it as additional parameters or via class members.

Parameters:ratings (pandas.DataFrame) – rating data, as a matrix with columns ‘user’, ‘item’, and ‘rating’. The user and item identifiers may be of any type.
Returns:the trained model (of an implementation-defined type).

FunkSVD

FunkSVD is an SVD-like matrix factorization that uses stochastic gradient descent, configured much like coordinate descent, to train the user-feature and item-feature matrices.

class lenskit.algorithms.funksvd.FunkSVD(features, iterations=100, lrate=0.001, reg=0.015, damping=5, range=None)

Algorithm class implementing FunkSVD matrix factorization.

Parameters:
  • features (int) – the number of features to train
  • iterations (int) – the number of iterations to train each feature
  • lrate (double) – the learning rate
  • reg (double) – the regularization factor
  • damping (double) – damping factor for the underlying mean
  • range (tuple) – the (min, max) rating values to clamp ratings, or None to leave predictions unclamped.
load_model(path)

Save a trained model to a file.

Parameters:path (str) – the path to file from which to load the model.
Returns:the re-loaded model (of an implementation-defined type).
predict(model, user, items, ratings=None)

Compute predictions for a user and items.

Parameters:
  • model – the trained model to use. Either None or the ratings matrix if the algorithm has no concept of training.
  • user – the user ID
  • items (array-like) – the items to predict
  • ratings (pandas.Series) – the user’s ratings (indexed by item id); if provided, they may be used to override or augment the model’s notion of a user’s preferences.
Returns:

scores for the items, indexed by item id.

Return type:

pandas.Series

save_model(model, path)

Save a trained model to a file or directory. The default implementation pickles the model.

Algorithms are allowed to use any format for saving their models, including directories.

Parameters:
  • model – the trained model.
  • path (str) – the path at which to save the model.
train(ratings, bias=None)

Train a FunkSVD model.

Parameters:
  • ratings – the ratings data frame.
  • bias (bias.BiasModel) – a pre-trained bias model to use.
Returns:

The trained biased MF model.