Batch-Running Recommenders

Batch-Running Recommenders#

The functions in lenskit.batch enable you to generate many recommendations or predictions at the same time, useful for evaluations and experiments.

The batch functions can parallelize over users with the optional n_jobs parameter, or the LK_NUM_PROCS environment variable.

Note

Scripts calling the batch recommendation or prediction facilites must be protected; that is, they should not directly perform their work when run, but should define functions and call a main function when run as a script, with a block like this at the end of the file:

def main():
    # do the actual work

if __name__ == '__main__':
    main()

If you are using the batch functions from a Jupyter notbook, you should be fine - the Jupyter programs are appropriately protected.

Recommendation#

lenskit.batch.recommend(algo, users, n, candidates=None, *, n_jobs=None, **kwargs)#

Batch-recommend for multiple users. The provided algorithm should be a algorithms.Recommender.

Parameters:
  • algo – the algorithm

  • users (array-like) – the users to recommend for

  • n (int) – the number of recommendations to generate (None for unlimited)

  • candidates – the users’ candidate sets. This can be a function, in which case it will be passed each user ID; it can also be a dictionary, in which case user IDs will be looked up in it. Pass None to use the recommender’s built-in candidate selector (usually recommended).

  • n_jobs (int) – The number of processes to use for parallel recommendations. Passed to lenskit.util.parallel.invoker().

Returns:

A frame with at least the columns user, rank, and item; possibly also score, and any other columns returned by the recommender.

Rating Prediction#

lenskit.batch.predict(algo, pairs, *, n_jobs=None, **kwargs)#

Generate predictions for user-item pairs. The provided algorithm should be a algorithms.Predictor or a function of two arguments: the user ID and a list of item IDs. It should return a dictionary or a pandas.Series mapping item IDs to predictions.

To use this function, provide a pre-fit algorithm:

>>> from lenskit.algorithms.bias import Bias
>>> from lenskit.metrics.predict import rmse
>>> from lenskit.data import from_interactions_df
>>> from lenskit.data.movielens import load_movielens_df
>>> ratings = load_movielens_df('data/ml-latest-small')
>>> bias = Bias()
>>> bias.fit(from_interactions_df(ratings[:-1000]))
<lenskit.algorithms.bias.Bias object at ...>
>>> preds = predict(bias, ratings[-1000:])
>>> preds.head()
       user  item  rating   timestamp  prediction
99004   664  8361     3.0  1393891425    3.288286
99005   664  8528     3.5  1393891047    3.559119
99006   664  8529     4.0  1393891173    3.573008
99007   664  8636     4.0  1393891175    3.846268
99008   664  8641     4.5  1393890852    3.710635
>>> rmse(preds['prediction'], preds['rating'])
0.832699...
Parameters:
  • algo (lenskit.algorithms.Predictor) – A rating predictor function or algorithm.

  • pairs (pandas.DataFrame) – A data frame of (user, item) pairs to predict for. If this frame also contains a rating column, it will be included in the result.

  • n_jobs (int) – The number of processes to use for parallel batch prediction. Passed to lenskit.util.parallel.invoker().

Returns:

a frame with columns user, item, and prediction containing the prediction results. If pairs contains a rating column, this result will also contain a rating column.

Return type:

pandas.DataFrame