lenskit.pipeline.Pipeline#

class lenskit.pipeline.Pipeline(config, nodes)#

Bases: object

LensKit recommendation pipeline. This is the core abstraction for using LensKit models and other components to produce recommendations in a useful way. It allows you to wire together components in (mostly) abitrary graphs, train them on data, and serialize pipelines to disk for use elsewhere.

Pipelines should not be directly instantiated; they must be built with a PipelineBuilder class, or loaded from a configuration with from_config(). If you have a scoring model and just want to generate recommenations with a default setup and minimal configuration, see topn_pipeline() or RecPipelineBuilder.

Pipelines are also Trainable, and train all trainable components.

Stability:
Caller (see Stability Levels).
Parameters:
__init__(config, nodes)#
Parameters:

Methods

__init__(config, nodes)

clone()

Clone the pipeline, without its trained parameters.

from_config(config)

Reconstruct a pipeline from a serialized configuration.

meta()

Get the metadata (name, version, hash, etc.) for this pipeline without returning the whole config.

modify()

Create a pipeline builder from this pipeline in order to modify it.

node(node, *[, missing])

Get the pipeline node with the specified name.

node_input_connections(node)

Get the input wirings for a node.

nodes()

Get the nodes in the pipeline graph.

run([nodes])

Run the pipeline and obtain the return value(s) of one or more of its components.

run_all(*nodes, **kwargs)

Run all nodes in the pipeline, or all nodes required to fulfill the requested node, and return a mapping with the full pipeline state (the data attached to each node).

train(data[, options])

Trains the pipeline's trainable components (those implementing the TrainableComponent interface) on some training data.

Attributes

config

Get the pipline configuration.

config_hash

Get a hash of the pipeline's configuration to uniquely identify it for logging, version control, or other purposes.

name

Get the pipeline name (if configured).

version

Get the pipeline version (if configured).

property config: PipelineConfig#

Get the pipline configuration.

Important

Do not modify the configuration returned, or it will become out-of-sync with the pipeline and likely not behave correctly.

property name: str | None#

Get the pipeline name (if configured).

property version: str | None#

Get the pipeline version (if configured).

meta()#

Get the metadata (name, version, hash, etc.) for this pipeline without returning the whole config.

Return type:

PipelineMeta

nodes()#

Get the nodes in the pipeline graph.

Return type:

list[Node[object]]

node(node, *, missing='error')#

Get the pipeline node with the specified name. If passed a node, it returns the node or fails if the node is not a member of the pipeline.

Parameters:
  • node (str | Node[Any]) – The name of the pipeline node to look up, or a node to check for membership.

  • missing (Literal['error', 'none'] | None)

Returns:

The pipeline node, if it exists.

Raises:

KeyError – The specified node does not exist.

Return type:

Node[object] | None

node_input_connections(node)#

Get the input wirings for a node.

Parameters:

node (str | Node[Any])

Return type:

Mapping[str, Node[Any]]

clone()#

Clone the pipeline, without its trained parameters.

Returns:

A new pipeline with the same components and wiring, but fresh instances created by round-tripping the configuration.

Return type:

Pipeline

property config_hash: str#

Get a hash of the pipeline’s configuration to uniquely identify it for logging, version control, or other purposes.

The hash format and algorithm are not guaranteed, but is stable within a LensKit version. For the same version of LensKit and component code, the same configuration will produce the same hash, so long as there are no literal nodes. Literal nodes will usually hash consistently, but since literals other than basic JSON values are hashed by pickling, hash stability depends on the stability of the pickle bytestream.

In LensKit 2025.1, the configuration hash is computed by computing the JSON serialization of the pipeline configuration without a hash and returning the hex-encoded SHA256 hash of that configuration.

static from_config(config)#

Reconstruct a pipeline from a serialized configuration.

Parameters:

config (object) – The configuration object, as loaded from JSON, TOML, YAML, or similar. Will be validated into a PipelineConfig.

Returns:

The configured (but not trained) pipeline.

Raises:

PipelineError – If there is a configuration error reconstructing the pipeline.

Warns:

PipelineWarning – If the configuration is funny but usable; for example, the configuration includes a hash but the constructed pipeline does not have a matching hash.

Return type:

Pipeline

modify()#

Create a pipeline builder from this pipeline in order to modify it.

Pipelines cannot be modified in-place, but this method sets up a new builder that will create a modified copy of the pipeline. Unmodified component instances are reused as-is.

Note

Since default connections are applied in build(), the modifying builder does not have default connections.

Return type:

PipelineBuilder

train(data, options=None)#

Trains the pipeline’s trainable components (those implementing the TrainableComponent interface) on some training data.

Random Number Generation

If TrainingOptions.rng is set and is not a generator or bit generator (i.e. it is a seed), then this method wraps the seed in a SeedSequence and calls spawn() to generate a distinct seed for each component in the pipeline.

Parameters:
  • data (Dataset) – The dataset to train on.

  • options (TrainingOptions | None) – The training options. If None, default options are used.

Return type:

None

run(nodes=None, /, **kwargs)#

Run the pipeline and obtain the return value(s) of one or more of its components. See Execution for details of the pipeline execution model.

Parameters:
  • nodes (str | Node[Any] | tuple[str, ...] | tuple[Node[Any], ...] | None) – The component(s) to run.

  • kwargs (object) – The pipeline’s inputs, as defined with create_input().

Returns:

The pipeline result. If no nodes are supplied, this is the result of the default node. If a single node is supplied, it is the result of that node. If a tuple of nodes is supplied, it is a tuple of their results.

Raises:
  • PipelineError – when there is a pipeline configuration error (e.g. a cycle).

  • ValueError – when one or more required inputs are missing.

  • TypeError – when one or more required inputs has an incompatible type.

  • other – exceptions thrown by components are passed through.

Return type:

object

run_all(*nodes, **kwargs)#

Run all nodes in the pipeline, or all nodes required to fulfill the requested node, and return a mapping with the full pipeline state (the data attached to each node). This is useful in cases where client code needs to be able to inspect the data at arbitrary steps of the pipeline. It differs from run() in two ways:

  1. It returns the data from all nodes as a mapping (dictionary-like object), not just the specified nodes as a tuple.

  2. If no nodes are specified, it runs all nodes. This has the consequence of running nodes that are not required to fulfill the last node (such scenarios typically result from using use_first_of()).

Parameters:
  • nodes (str | Node[Any]) – The nodes to run, as positional arguments (if no nodes are specified, this method runs all nodes).

  • kwargs (object) – The inputs.

Returns:

The full pipeline state, with default set to the last node specified.

Return type:

PipelineState