Classic Matrix Factorization

LKPY provides classical matrix factorization implementations.

Common Support

The mf_common module contains common support code for matrix factorization algorithms.

class lenskit.algorithms.mf_common.MFPredictor

Common predictor for matrix factorization.

user_index_

Users in the model (length=:math:m).

Type

pandas.Index

item_index_

Items in the model (length=:math:n).

Type

pandas.Index

user_features_

The \(m \times k\) user-feature matrix.

Type

numpy.ndarray

item_features_

The \(n \times k\) item-feature matrix.

Type

numpy.ndarray

lookup_items(items)

Look up the indices for a set of items.

Parameters

items (array-like) – the item IDs to look up.

Returns

the item indices. Unknown items will have negative indices.

Return type

numpy.ndarray

lookup_user(user)

Look up the index for a user.

Parameters

user – the user ID to look up

Returns

the user index.

Return type

int

n_features

The number of features.

n_items

The number of items.

n_users

The number of users.

score(user, items)

Score a set of items for a user. User and item parameters must be indices into the matrices.

Parameters
  • user (int) – the user index

  • items (array-like of int) – the item indices

  • raw (bool) – if True, do return raw scores without biases added back.

Returns

the scores for the items.

Return type

numpy.ndarray

class lenskit.algorithms.mf_common.BiasMFPredictor

Common model for biased matrix factorization.

user_index_

Users in the model (length=:math:m).

Type

pandas.Index

item_index_

Items in the model (length=:math:n).

Type

pandas.Index

global_bias_

The global bias term.

Type

double

user_bias_

The user bias terms.

Type

numpy.ndarray

item_bias_

The item bias terms.

Type

numpy.ndarray

user_features_

The \(m \times k\) user-feature matrix.

Type

numpy.ndarray

item_features_

The \(n \times k\) item-feature matrix.

Type

numpy.ndarray

score(user, items, raw=False)

Score a set of items for a user. User and item parameters must be indices into the matrices.

Parameters
  • user (int) – the user index

  • items (array-like of int) – the item indices

  • raw (bool) – if True, do return raw scores without biases added back.

Returns

the scores for the items.

Return type

numpy.ndarray

Alternating Least Squares

LensKit provides alternating least squares implementations of matrix factorization suitable for explicit feedback data. These implementations are parallelized with Numba, and perform best with the MKL from Conda.

class lenskit.algorithms.als.BiasedMF(features, *, iterations=20, reg=0.1, damping=5, bias=True)

Bases: lenskit.algorithms.mf_common.BiasMFPredictor

Biased matrix factorization trained with alternating least squares [ZWSP2008]. This is a prediction-oriented algorithm suitable for explicit feedback data.

ZWSP2008

Yunhong Zhou, Dennis Wilkinson, Robert Schreiber, and Rong Pan. 2008. Large-Scale Parallel Collaborative Filtering for the Netflix Prize. In +Algorithmic Aspects in Information and Management_, LNCS 5034, 337–348. DOI 10.1007/978-3-540-68880-8_32.

Parameters
  • features (int) – the number of features to train

  • iterations (int) – the number of iterations to train

  • reg (double) – the regularization factor

  • damping (double) – damping factor for the underlying mean

fit(ratings)

Run ALS to train a model.

Parameters

ratings – the ratings data frame.

Returns

The algorithm (for chaining).

predict_for_user(user, items, ratings=None)

Compute predictions for a user and items.

Parameters
  • user – the user ID

  • items (array-like) – the items to predict

  • ratings (pandas.Series) – the user’s ratings (indexed by item id); if provided, they may be used to override or augment the model’s notion of a user’s preferences.

Returns

scores for the items, indexed by item id.

Return type

pandas.Series

class lenskit.algorithms.als.ImplicitMF(features, *, iterations=20, reg=0.1, weight=40)

Bases: lenskit.algorithms.mf_common.MFPredictor

Implicit matrix factorization trained with alternating least squares [HKV2008]. This algorithm outputs ‘predictions’, but they are not on a meaningful scale. If its input data contains rating values, these will be used as the ‘confidence’ values; otherwise, confidence will be 1 for every rated item.

HKV2008(1,2)

Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In _Proceedings of the 2008 Eighth IEEE International Conference on Data Mining_, 263–272. DOI 10.1109/ICDM.2008.22

Parameters
  • features (int) – the number of features to train

  • iterations (int) – the number of iterations to train

  • reg (double) – the regularization factor

  • weight (double) – the scaling weight for positive samples (\(\alpha\) in [HKV2008]).

fit(ratings)

Train a model using the specified ratings (or similar) data.

Parameters
  • ratings (pandas.DataFrame) – The ratings data.

  • args – Additional training data the algorithm may require.

  • kwargs – Additional training data the algorithm may require.

Returns

The algorithm object.

predict_for_user(user, items, ratings=None)

Compute predictions for a user and items.

Parameters
  • user – the user ID

  • items (array-like) – the items to predict

  • ratings (pandas.Series) – the user’s ratings (indexed by item id); if provided, they may be used to override or augment the model’s notion of a user’s preferences.

Returns

scores for the items, indexed by item id.

Return type

pandas.Series

FunkSVD

FunkSVD is an SVD-like matrix factorization that uses stochastic gradient descent, configured much like coordinate descent, to train the user-feature and item-feature matrices.

class lenskit.algorithms.funksvd.FunkSVD(features, iterations=100, *, lrate=0.001, reg=0.015, damping=5, range=None, bias=True)

Bases: lenskit.algorithms.mf_common.BiasMFPredictor

Algorithm class implementing FunkSVD matrix factorization.

Parameters
  • features (int) – the number of features to train

  • iterations (int) – the number of iterations to train each feature

  • lrate (double) – the learning rate

  • reg (double) – the regularization factor

  • damping (double) – damping factor for the underlying mean

  • bias (Predictor) – the underlying bias model to fit. If True, then a basic.Bias model is fit with damping.

  • range (tuple) – the (min, max) rating values to clamp ratings, or None to leave predictions unclamped.

fit(ratings)

Train a FunkSVD model.

Parameters

ratings – the ratings data frame.

predict_for_user(user, items, ratings=None)

Compute predictions for a user and items.

Parameters
  • user – the user ID

  • items (array-like) – the items to predict

  • ratings (pandas.Series) – the user’s ratings (indexed by item id); if provided, they may be used to override or augment the model’s notion of a user’s preferences.

Returns

scores for the items, indexed by item id.

Return type

pandas.Series