Batch-Running Recommenders¶
The functions in lenskit.batch
enable you to generate many recommendations or
predictions at the same time, useful for evaluations and experiments.
Recommendation¶
-
lenskit.batch.
recommend
(algo, users, n, candidates=None, *, n_jobs=None, dask_result=False, **kwargs)¶ Batch-recommend for multiple users. The provided algorithm should be a
algorithms.Recommender
.- Parameters
algo – the algorithm
users (array-like) – the users to recommend for
n (int) – the number of recommendations to generate (None for unlimited)
candidates – the users’ candidate sets. This can be a function, in which case it will be passed each user ID; it can also be a dictionary, in which case user IDs will be looked up in it. Pass
None
to use the recommender’s built-in candidate selector (usually recommended).n_jobs (int) –
The number of processes to use for parallel recommendations. Passed as
n_jobs
to :cls:`joblib.Parallel`. The default,None
, will make the process sequential _unless_ called inside thejoblib.parallel_backend()
context manager.Note
nprocs
is accepted as a deprecated alias.dask_result (bool) – Whether to return a Dask data frame instead of a Pandas one.
- Returns
A frame with at least the columns
user
,rank
, anditem
; possibly alsoscore
, and any other columns returned by the recommender.
Rating Prediction¶
-
lenskit.batch.
predict
(algo, pairs, *, n_jobs=None, **kwargs)¶ Generate predictions for user-item pairs. The provided algorithm should be a
algorithms.Predictor
or a function of two arguments: the user ID and a list of item IDs. It should return a dictionary or apandas.Series
mapping item IDs to predictions.To use this function, provide a pre-fit algorithm:
>>> from lenskit.algorithms.basic import Bias >>> from lenskit.metrics.predict import rmse >>> ratings = util.load_ml_ratings() >>> bias = Bias() >>> bias.fit(ratings[:-1000]) <lenskit.algorithms.basic.Bias object at ...> >>> preds = predict(bias, ratings[-1000:]) >>> preds.head() user item rating timestamp prediction 99004 664 8361 3.0 1393891425 3.288286 99005 664 8528 3.5 1393891047 3.559119 99006 664 8529 4.0 1393891173 3.573008 99007 664 8636 4.0 1393891175 3.846268 99008 664 8641 4.5 1393890852 3.710635 >>> rmse(preds['prediction'], preds['rating']) 0.8326992222...
- Parameters
algo (lenskit.algorithms.Predictor) – A rating predictor function or algorithm.
pairs (pandas.DataFrame) – A data frame of (
user
,item
) pairs to predict for. If this frame also contains arating
column, it will be included in the result.n_jobs (int) –
The number of processes to use for parallel batch prediction. Passed as
n_jobs
to :cls:`joblib.Parallel`. The default,None
, will make the process sequential _unless_ called inside thejoblib.parallel_backend()
context manager.Note
nprocs
is accepted as a deprecated alias.
- Returns
a frame with columns
user
,item
, andprediction
containing the prediction results. Ifpairs
contains a rating column, this result will also contain a rating column.- Return type
Scripting Evaluation¶
The MultiEval
class is useful to build scripts that evaluate multiple algorithms
or algorithm variants, simultaneously, across multiple data sets. It can extract parameters
from algorithms and include them in the output, useful for hyperparameter search.
For example:
from lenskit.batch import MultiEval
from lenskit.crossfold import partition_users, SampleN
from lenskit.algorithms import basic, als
from lenskit.util import load_ml_ratings
from lenskit import topn
import pandas as pd
Generate the train-test pairs:
pairs = list(partition_users(load_ml_ratings(), 5, SampleN(5)))
Set up and run the MultiEval
experiment:
eval = MultiEval('my-eval', recommend=20)
eval.add_datasets(pairs, name='ML-Small')
eval.add_algorithms(basic.Popular(), name='Pop')
eval.add_algorithms([als.BiasedMF(f) for f in [20, 30, 40, 50]],
attrs=['features'], name='ALS')
eval.run()
Now that the experiment is run, we can read its outputs.
First the run metadata:
runs = pd.read_csv('my-eval/runs.csv')
runs.set_index('RunId', inplace=True)
runs.head()
AlgoClass | AlgoStr | DataSet | Partition | PredTime | RecTime | TrainTime | features | name | |
---|---|---|---|---|---|---|---|---|---|
RunId | |||||||||
1 | Popular | Popular | ML-Small | 1 | NaN | 0.578916 | 0.278333 | NaN | Pop |
2 | BiasedMF | als.BiasedMF(features=20, regularization=0.1) | ML-Small | 1 | 0.377277 | 1.324478 | 5.426510 | 20.0 | ALS |
3 | BiasedMF | als.BiasedMF(features=30, regularization=0.1) | ML-Small | 1 | 0.326613 | 1.566073 | 1.300490 | 30.0 | ALS |
4 | BiasedMF | als.BiasedMF(features=40, regularization=0.1) | ML-Small | 1 | 0.408973 | 1.570634 | 1.904973 | 40.0 | ALS |
5 | BiasedMF | als.BiasedMF(features=50, regularization=0.1) | ML-Small | 1 | 0.357133 | 1.700047 | 2.390314 | 50.0 | ALS |
Then the recommendations:
recs = pd.read_parquet('my-eval/recommendations.parquet')
recs.head()
D:Anaconda3libsite-packagespyarrowpandas_compat.py:698: FutureWarning: .labels was deprecated in version 0.24.0. Use .codes instead. labels = getattr(columns, 'labels', None) or [ D:Anaconda3libsite-packagespyarrowpandas_compat.py:725: FutureWarning: the 'labels' keyword is deprecated, use 'codes' instead return pd.MultiIndex(levels=new_levels, labels=labels, names=columns.names) D:Anaconda3libsite-packagespyarrowpandas_compat.py:742: FutureWarning: .labels was deprecated in version 0.24.0. Use .codes instead. labels, = index.labels
item | score | user | rank | RunId | |
---|---|---|---|---|---|
0 | 356 | 335 | 6 | 1 | 1 |
1 | 296 | 323 | 6 | 2 | 1 |
2 | 318 | 305 | 6 | 3 | 1 |
3 | 593 | 302 | 6 | 4 | 1 |
4 | 260 | 284 | 6 | 5 | 1 |
In order to evaluate the recommendation list, we need to build a combined set of truth data. Since this is a disjoint partition of users over a single data set, we can just concatenate the individual test frames:
truth = pd.concat((p.test for p in pairs), ignore_index=True)
Now we can set up an analysis and compute the results.
rla = topn.RecListAnalysis()
rla.add_metric(topn.ndcg)
ndcg = rla.compute(recs, truth)
ndcg.head()
Next, we need to combine this with our run data, so that we know what algorithms and configurations we are evaluating:
ndcg = ndcg.join(runs[['AlgoClass', 'features']], on='RunId')
ndcg.head()
ndcg | AlgoClass | features | ||
---|---|---|---|---|
user | RunId | |||
1 | 11 | 0.0 | Popular | NaN |
12 | 0.0 | BiasedMF | 20.0 | |
13 | 0.0 | BiasedMF | 30.0 | |
14 | 0.0 | BiasedMF | 40.0 | |
15 | 0.0 | BiasedMF | 50.0 |
The Popular algorithm has NaN feature count, which groupby
doesn’t
like; let’s fill those in.
ndcg.loc[ndcg['AlgoClass'] == 'Popular', 'features'] = 0
And finally, we can compute the overall average performance for each algorithm configuration:
ndcg.groupby(['AlgoClass', 'features'])['ndcg'].mean()
AlgoClass features
BiasedMF 20.0 0.015960
30.0 0.022558
40.0 0.025901
50.0 0.028949
Popular 0.0 0.091814
Name: ndcg, dtype: float64
Multi-Eval Class Reference¶
-
class
lenskit.batch.
MultiEval
(path, *, predict=True, recommend=100, candidates=None, save_models=False, eval_n_jobs=None, combine=True, **kwargs)¶ A runner for carrying out multiple evaluations, such as parameter sweeps.
- Parameters
path (str or
pathlib.Path
) – the working directory for this evaluation. It will be created if it does not exist.predict (bool) – whether to generate rating predictions.
recommend (int) – the number of recommendations to generate per user. Any false-y value (
None
,False
,0
) will disable top-n. The literal valueTrue
will generate recommendation lists of unlimited size.candidates (function) – the default candidate set generator for recommendations. It should take the training data and return a candidate generator, itself a function mapping user IDs to candidate sets. Pass
None
to use the default candidate set configured for each algorithm (recommended).save_models (bool or str) – save individual estimated models to disk. If
True
, models are pickled to.pkl
files; if'gzip'
, they are pickled to gzip-compressed.pkl.gz
files; if'joblib'
, they are pickled withjoblib.dump()
to uncompressed.jlpkl
files.eval_n_jobs (int or None) – Value to pass to the
n_jobs
parameter inlenskit.batch.predict()
andlenskit.batch.recommend()
.combine (bool) – whether to combine output; if
False
, output will be left in separate files, ifTrue
, it will be in a single set of files (runs, recommendations, and predictions).
-
add_algorithms
(algos, attrs=[], **kwargs)¶ Add one or more algorithms to the run.
- Parameters
algos (algorithm or list) – the algorithm(s) to add.
attrs (list of str) – a list of attributes to extract from the algorithm objects and include in the run descriptions.
kwargs – additional attributes to include in the run descriptions.
-
add_datasets
(data, name=None, candidates=None, **kwargs)¶ Add one or more datasets to the run.
- Parameters
data –
The input data set(s) to run. Can be one of the following:
A tuple of (train, test) data.
An iterable of (train, test) pairs, in which case the iterable is not consumed until it is needed.
A function yielding either of the above, to defer data load until it is needed.
Data can be either data frames or paths; paths are loaded after detection using
util.read_df_detect()
.kwargs – additional attributes pertaining to these data sets.
-
collect_results
()¶ Collect the results from non-combined runs into combined output files.
-
persist_data
()¶ Persist the data for an experiment, replacing in-memory data sets with file names. Once this has been called, the sweep can be pickled.
-
run
(runs=None, *, progress=None)¶ Run the evaluation.
- Parameters
runs (int or set-like) – If provided, a specific set of runs to run. Useful for splitting an experiment into individual runs. This is a set of 1-based run IDs, not 0-based indexes.
progress – A
tqdm.tqdm()
-compatible progress function.
-
run_count
()¶ Get the number of runs in this evaluation.