Parallel Execution#
LensKit supports various forms of parallel execution, each with an environment variable controlling its :
Batch operations using multi-process execution.
Parallel model training. For most models provided by LensKit, this is usually implemented using PyTorch JIT parallelism (
torch.jit.fork()
).Parallel computation in the various backends (BLAS, MKL, Torch, etc.).
Other models compatible with LensKit may use their own parallel processing logic.
Configuring Parallelism#
LensKit provides 4 knobs for configuring parallelism, each of which has a
corresponding environment variable and parameter to initialize()
. The
environment variables are:
- LK_NUM_PROCS#
The number of processes to use for batch operations. Defaults to the number of CPUs or 4, whichever is lower.
- LK_NUM_THREADS#
The number of threads to use for parallel model building. Defaults to the number of CPUs or 8, whichever is smaller.
This number is passed to
torch.set_num_interop_threads()
to set up the Torch JIT thread count.
- LK_NUM_BACKEND_THREADS#
The number of threads to be used by backend compute engines. Defaults to up to 4 backend threads per training thread, depending on the capacity of the machine:
max(min(NCPUS // LK_NUM_THREADS, 4), 1)
This is passed to
torch.set_num_threads()
(to control PyTorch internal parallelism), and to the underlying BLAS layer (via threadpoolctl).
- LK_NUM_CHILD_THREADS#
The number of backend threads to be used in worker processes spawned by batch evaluation. Defaults to 4 per process, capped by the number of CPUs available:
max(min(NCPUS // LK_NUM_PROCS, 4), 1)
Workers have both the process and thread counts set to 1.
- lenskit.parallel.initialize(*, processes=None, threads=None, backend_threads=None, child_threads=None)#
Set up and configure LensKit parallelism. This only needs to be called if you want to control when and how parallelism is set up; components using parallelism will call
ensure_init()
, which will call this function with its default arguments if it has not been called.- Parameters:
processes (int | None) – The number of processes to use for multiprocessing evaluations (see
LK_NUM_PROCS
)threads (int | None) – The number of threads to use for parallel model training and similar operations (see
LK_NUM_THREADS
).backend_threads (int | None) – The number of threads underlying computational engines should use (see
LK_NUM_BACKEND_THREADS
).child_threads (int | None) – The number of threads backends are allowed to use in the worker processes in multiprocessing operations (see
LK_NUM_CHILD_THREADS
).
- lenskit.parallel.ensure_parallel_init()#
Make sure LensKit parallelism is configured, and configure with defaults if it is not.
Components using parallelism or intensive computations should call this function before they begin training.
Parallel Model Ops#
LensKit uses a custom API wrapping multiprocessing.pool.Pool
to
parallelize batch operations (see lenskit.batch
).
The basic idea of this API is to create an invoker that has a model and a function, and then passing lists of argument sets to the function:
with invoker(model, func):
results = list(func.map(args))
The model is persisted into shared memory to be used by the worker processes. PyTorch tensors, including those on CUDA devices, are shared.
LensKit users will generally not need to directly use parallel op invokers, but if you are implementing new batch operations with parallelism they are useful. They may also be useful for other kinds of analysis.
- lenskit.parallel.invoker(model, func, n_jobs=None, progress=None)#
Get an appropriate invoker for performing operations on
model
.- Parameters:
model (M) – The model object on which to perform operations.
func (Callable[[M, A], R]) – The function to call. The function must be pickleable.
n_jobs (int | None) – The number of processes to use for parallel operations. If
None
, will callproc_count()
with a maximum default process count of 4.progress (Progress | None) –
A progress bar to use to report status. It should have the following states:
dispatched
in-progress
finished
One can be created with
invoke_progress()
- Returns:
An invoker to perform operations on the model.
- Return type:
- class lenskit.parallel.ModelOpInvoker#
-
Interface for invoking operations on a model, possibly in parallel. The operation invoker is configured with a model and a function to apply, and applies that function to the arguments supplied in map. Child process invokers also route logging messages to the parent process, so logging works even with multiprocessing.
An invoker is a context manager that calls
shutdown()
when exited.- abstract map(tasks)#
Apply the configured function to the model and iterables. This is like
map()
, except it supplies the invoker’s model as the first object tofunc
.- Parameters:
iterables – Iterables of arguments to provide to the function.
tasks (Iterable[A])
- Returns:
An iterable of the results.
- Return type:
iterable
Logging and Progress#
Multi-process op invokers automatically set up logging and progress reporting to
work across processes using the manylog
package. Op invokers can also
report the progress of queued jobs to a progress_api.Progress
.
Computing Work Chunks#
The WorkChunks
class provides support for dividing work into chunks for
parallel processing, particularly for model training.
- class lenskit.parallel.chunking.WorkChunks(total, chunk_size, chunk_count)#
Bases:
NamedTuple
The chunking configuration for parallel work.
- classmethod create(njobs)#
Compute the chunk size for parallel model training.