Prediction Accuracy Metrics¶

The lenskit.metrics.predict module contains prediction accuracy metrics. These are intended to be used as a part of a Pandas split-apply-combine operation on a data frame that contains both predictions and ratings; for convenience, the lenskit.batch.predict() function will include ratings in the prediction frame when its input user-item pairs contains ratings. So you can perform the following to compute per-user RMSE over some predictions:

preds = predict(algo, pairs)
user_rmse = preds.groupby('user').apply(lambda df: rmse(df.prediction, df.rating))

Metric Functions¶

Prediction metric functions take two series, predictions and truth.

lenskit.metrics.predict.rmse(predictions, truth, missing='error')¶

Compute RMSE (root mean squared error).

Parameters

predictions (pandas.Series) – the predictions
truth (pandas.Series) – the ground truth ratings from data
missing (string) – how to handle predictions without truth. Can be one of 'error' or 'ignore'.

Returns

the root mean squared approximation error

Return type

double

lenskit.metrics.predict.mae(predictions, truth, missing='error')¶

Compute MAE (mean absolute error).

Parameters

predictions (pandas.Series) – the predictions
truth (pandas.Series) – the ground truth ratings from data
missing (string) – how to handle predictions without truth. Can be one of 'error' or 'ignore'.

Returns

the mean absolute approximation error

Return type

double

Working with Missing Data¶

LensKit rating predictors do not report predictions when their core model is unable to predict. For example, a nearest-neighbor recommender will not score an item if it cannot find any suitable neighbors. Following the Pandas convention, these items are given a score of NaN (when Pandas implements better missing data handling, it will use that, so use pandas.Series.isna()/pandas.Series.notna(), not the isnan versions.

However, this causes problems when computing predictive accuracy: recommenders are not being tested on the same set of items. If a recommender only scores the easy items, for example, it could do much better than a recommender that is willing to attempt more difficult items.

A good solution to this is to use a fallback predictor so that every item has a prediction. In LensKit, lenskit.algorithms.basic.Fallback implements this functionality; it wraps a sequence of recommenders, and for each item, uses the first one that generates a score.

You set it up like this:

cf = ItemItem(20)
base = Bias(damping=5)
algo = Fallback(cf, base)