Batch-Running Pipelines#

Offline recommendation experiments require batch-running a pipeline over a set of test users, sessions, or other recommendation requests. LensKit supports this through the facilities in the lenskit.batch module.

By default, the batch facilities operate in parallel over the test users; this can be controlled by environment variables (see Configuring Parallelism) or through an n_jobs keyword argument to the various functions and classes.

Import Protection

Scripts using batch pipeline operations must be protected; see parallel-protecting.

Simple Runs#

If you have a pipeline and want to simply generate recommendations for a batch of test users, you can do this with the recommend() function.

For an example, let’s start with importing things to run a quick batch:

>>> from lenskit.basic import PopScorer
>>> from lenskit.pipeline import topn_pipeline
>>> from lenskit.batch import recommend
>>> from lenskit.data import load_movielens
>>> from lenskit.splitting import sample_users, SampleN
>>> from lenskit.metrics import RunAnalysis, RBP

Load and split some data:

>>> data = load_movielens('data/ml-100k.zip')
>>> split = sample_users(data, 150, SampleN(5))

Configure and train the model:

>>> model = PopScorer()
>>> pop_pipe = topn_pipeline(model, n=20)
>>> pop_pipe.train(split.train)

Generate recommendations:

>>> recs = recommend(pop_pipe, split.test.keys(), n_jobs=1)
>>> recs.to_df()
          user_id  item_id     score  rank
0 ...                                    1
...
[3000 rows x 4 columns]

And measure their results:

>>> measure = RunAnalysis()
>>> measure.add_metric(RBP())
>>> scores = measure.compute(recs, split.test)
>>> scores.list_summary()    
        mean    median     std
metric
RBP  0.07...    0.0...  0.1...

The predict() function works similarly, but for rating predictions; instead of a simple list of user IDs, it takes a dictionary mapping user IDs to lists of test items (as ItemList).

General Batch Pipeline Runs#

The recommend() and predict() functions are convenience wrappers around a more general facility, the BatchPipelineRunner.