Batch-Running Recommenders

The functions in lenskit.batch enable you to generate many recommendations or predictions at the same time, useful for evaluations and experiments.

Recommendation

lenskit.batch.recommend(algo, users, n, candidates, ratings=None, nprocs=None)

Batch-recommend for multiple users. The provided algorithm should be a algorithms.Recommender or algorithms.Predictor (which will be converted to a top-N recommender).

Parameters:
  • algo – the algorithm
  • model – The algorithm model
  • users (array-like) – the users to recommend for
  • n (int) – the number of recommendations to generate (None for unlimited)
  • candidates – the users’ candidate sets. This can be a function, in which case it will be passed each user ID; it can also be a dictionary, in which case user IDs will be looked up in it.
  • ratings (pandas.DataFrame) – if not None, a data frame of ratings to attach to recommendations when available.
Returns:

A frame with at least the columns user, rank, and item; possibly also score, and any other columns returned by the recommender.

Rating Prediction

lenskit.batch.predict(algo, pairs, nprocs=None)

Generate predictions for user-item pairs. The provided algorithm should be a algorithms.Predictor or a function of two arguments: the user ID and a list of item IDs. It should return a dictionary or a pandas.Series mapping item IDs to predictions.

Parameters:
  • algo (lenskit.algorithms.Predictor) – A rating predictor function or algorithm.
  • pairs (pandas.DataFrame) – A data frame of (user, item) pairs to predict for. If this frame also contains a rating column, it will be included in the result.
  • nprocs (int) – The number of processes to use for parallel batch prediction.
Returns:

a frame with columns user, item, and prediction containing the prediction results. If pairs contains a rating column, this result will also contain a rating column.

Return type:

pandas.DataFrame

Scripting Evaluation

class lenskit.batch.MultiEval(path, predict=True, recommend=100, candidates=<class 'lenskit.topn.UnratedCandidates'>, nprocs=None, combine=True)

A runner for carrying out multiple evaluations, such as parameter sweeps.

Parameters:
  • path (str or pathlib.Path) – the working directory for this evaluation. It will be created if it does not exist.
  • predict (bool) – whether to generate rating predictions.
  • recommend (int) – the number of recommendations to generate per user (None to disable top-N).
  • candidates (function) – the default candidate set generator for recommendations. It should take the training data and return a candidate generator, itself a function mapping user IDs to candidate sets.
  • combine (bool) – whether to combine output; if False, output will be left in separate files, if True, it will be in a single set of files (runs, recommendations, and preditions).
add_algorithms(algos, parallel=False, attrs=[], **kwargs)

Add one or more algorithms to the run.

Parameters:
  • algos (algorithm or list) – the algorithm(s) to add.
  • parallel (bool) – if True, allow this algorithm to be trained in parallel with others.
  • attrs (list of str) – a list of attributes to extract from the algorithm objects and include in the run descriptions.
  • kwargs – additional attributes to include in the run descriptions.
add_datasets(data, name=None, candidates=None, **kwargs)

Add one or more datasets to the run.

Parameters:
  • data

    The input data set(s) to run. Can be one of the following:

    • A tuple of (train, test) data.
    • An iterable of (train, test) pairs, in which case the iterable is not consumed until it is needed.
    • A function yielding either of the above, to defer data load until it is needed.

    Data can be either data frames or paths; paths are loaded after detection using util.read_df_detect().

  • kwargs – additional attributes pertaining to these data sets.
collect_results()

Collect the results from non-combined runs into combined output files.

persist_data()

Persist the data for an experiment, replacing in-memory data sets with file names. Once this has been called, the sweep can be pickled.

run(runs=None)

Run the evaluation.

Parameters:runs (int or set-like) – If provided, a specific set of runs to run. Useful for splitting an experiment into individual runs. This is a set of 1-based run IDs, not 0-based indexes.
run_count()

Get the number of runs in this evaluation.