Batch-Running Pipelines#
Offline recommendation experiments require batch-running a pipeline over a set
of test users, sessions, or other recommendation requests. LensKit supports this
through the facilities in the lenskit.batch
module.
By default, the batch facilities operate in parallel over the test users; this
can be controlled by environment variables (see Configuring Parallelism) or
through an n_jobs
keyword argument to the various functions and classes.
Import Protection
Scripts using batch pipeline operations must be protected; see parallel-protecting.
Simple Runs#
If you have a pipeline and want to simply generate recommendations for a batch
of test users, you can do this with the recommend()
function.
For an example, let’s start with importing things to run a quick batch:
>>> from lenskit.basic import PopScorer
>>> from lenskit.pipeline import topn_pipeline
>>> from lenskit.batch import recommend
>>> from lenskit.data import load_movielens
>>> from lenskit.splitting import sample_users, SampleN
>>> from lenskit.metrics import RunAnalysis, RBP
Load and split some data:
>>> data = load_movielens('data/ml-100k.zip')
>>> split = sample_users(data, 150, SampleN(5))
Configure and train the model:
>>> model = PopScorer()
>>> pop_pipe = topn_pipeline(model, n=20)
>>> pop_pipe.train(split.train)
Generate recommendations:
>>> recs = recommend(pop_pipe, split.test.keys(), n_jobs=1)
>>> recs.to_df()
user_id item_id score rank
0 ... 1
...
[3000 rows x 4 columns]
And measure their results:
>>> measure = RunAnalysis()
>>> measure.add_metric(RBP())
>>> scores = measure.compute(recs, split.test)
>>> scores.list_summary()
mean median std
metric
RBP 0.07... 0.0... 0.1...
The predict()
function works similarly, but for rating predictions;
instead of a simple list of user IDs, it takes a dictionary mapping user IDs to
lists of test items (as ItemList
).
General Batch Pipeline Runs#
The recommend()
and predict()
functions are convenience
wrappers around a more general facility, the BatchPipelineRunner
.