Prediction Accuracy Metrics
The lenskit.metrics.predict
module contains prediction accuracy metrics.
These are intended to be used as a part of a Pandas split-apply-combine operation
on a data frame that contains both predictions and ratings; for convenience, the
lenskit.batch.predict()
function will include ratings in the prediction
frame when its input user-item pairs contains ratings. So you can perform the
following to compute per-user RMSE over some predictions:
from lenskit.datasets import MovieLens
from lenskit.algorithms.bias import Bias
from lenskit.batch import predict
from lenskit.metrics.predict import user_metric, rmse
ratings = MovieLens('ml-small').ratings.sample(frac=0.1)
test = ratings.iloc[:1000]
train = ratings.iloc[1000:]
algo = Bias()
algo.fit(train)
preds = predict(algo, test)
user_metric(preds, metric=rmse)
Metric Functions
Prediction metric functions take two series, predictions and truth, and compute a prediction accuracy metric for them.
- lenskit.metrics.predict.rmse(predictions, truth, missing='error')
Compute RMSE (root mean squared error). This is computed as:
\[\sum_{r_{ui} \in R} \left(r_{ui} - s(i|u)\right)^2\]When used with
user_metric()
, or on series grouped by user, it computes a per-user RMSE; when applied to an entire prediction frame, it computes global RMSE. It does not do any fallbacks; if you want to compute RMSE with fallback predictions (e.g. usign a bias model when a collaborative filter cannot predict), generate predictions withlenskit.algorithms.basic.Fallback
.- Parameters:
predictions (pandas.Series) – the predictions
truth (pandas.Series) – the ground truth ratings from data
missing (string) – how to handle predictions without truth. Can be one of
'error'
or'ignore'
.
- Returns:
the root mean squared approximation error
- Return type:
double
- lenskit.metrics.predict.mae(predictions, truth, missing='error')
Compute MAE (mean absolute error). This is computed as:
\[\sum_{r_{ui} \in R} \left|r_{ui} - s(i|u)\right|\]When used with
user_metric()
, or on series grouped by user, it computes a per-user MAE; when applied to an entire prediction frame, it computes global MAE. It does not do any fallbacks; if you want to compute MAE with fallback predictions (e.g. usign a bias model when a collaborative filter cannot predict), generate predictions withlenskit.algorithms.basic.Fallback
.- Parameters:
predictions (pandas.Series) – the predictions
truth (pandas.Series) – the ground truth ratings from data
missing (string) – how to handle predictions without truth. Can be one of
'error'
or'ignore'
.
- Returns:
the mean absolute approximation error
- Return type:
double
Convenience Functions
These functions make it easier to compute global and per-user prediction metrics.
- lenskit.metrics.predict.user_metric(predictions, *, score_column='prediction', metric=<function rmse>, **kwargs)
Compute a mean per-user prediction accuracy metric for a set of predictions.
- Parameters:
predictions (pandas.DataFrame) – Data frame containing the predictions. Must have a column
rating
containing ground truth and a score column with rating predictions, along with a'user'
column with user IDs.score_column (str) – The name of the score column (defaults to
'prediction'
).metric (function) – A metric function of two parameters (prediction and truth). Defaults to
rmse()
.
- Returns:
The mean of the per-user value of the metric.
- Return type:
- lenskit.metrics.predict.global_metric(predictions, *, score_column='prediction', metric=<function rmse>, **kwargs)
Compute a global prediction accuracy metric for a set of predictions.
- Parameters:
predictions (pandas.DataFrame) – Data frame containing the predictions. Must have a column
rating
containing ground truth and a score column with rating predictions.score_column (str) – The name of the score column (defaults to
'prediction'
).metric (function) – A metric function of two parameters (prediction and truth). Defaults to
rmse()
.
- Returns:
The global metric value.
- Return type:
Working with Missing Data
LensKit rating predictors do not report predictions when their core model is unable
to predict. For example, a nearest-neighbor recommender will not score an item if
it cannot find any suitable neighbors. Following the Pandas convention, these items
are given a score of NaN (when Pandas implements better missing data handling, it will
use that, so use pandas.Series.isna()
/pandas.Series.notna()
, not the
isnan
versions.
However, this causes problems when computing predictive accuracy: recommenders are not being tested on the same set of items. If a recommender only scores the easy items, for example, it could do much better than a recommender that is willing to attempt more difficult items.
A good solution to this is to use a fallback predictor so that every item has a
prediction. In LensKit, lenskit.algorithms.basic.Fallback
implements
this functionality; it wraps a sequence of recommenders, and for each item, uses
the first one that generates a score.
You set it up like this:
cf = ItemItem(20)
base = Bias(damping=5)
algo = Fallback(cf, base)