lenskit.pipeline.Pipeline#
- class lenskit.pipeline.Pipeline(config, nodes)#
Bases:
object
LensKit recommendation pipeline. This is the core abstraction for using LensKit models and other components to produce recommendations in a useful way. It allows you to wire together components in (mostly) abitrary graphs, train them on data, and serialize pipelines to disk for use elsewhere.
Pipelines should not be directly instantiated; they must be built with a
PipelineBuilder
class, or loaded from a configuration withfrom_config()
. If you have a scoring model and just want to generate recommenations with a default setup and minimal configuration, seetopn_pipeline()
orRecPipelineBuilder
.Pipelines are also
Trainable
, and train all trainable components.- Stability:
- Caller (see Stability Levels).
- Parameters:
config (config.PipelineConfig)
nodes (Iterable[Node[Any]])
- __init__(config, nodes)#
- Parameters:
config (PipelineConfig)
Methods
__init__
(config, nodes)clone
()Clone the pipeline, without its trained parameters.
from_config
(config)Reconstruct a pipeline from a serialized configuration.
meta
()Get the metadata (name, version, hash, etc.) for this pipeline without returning the whole config.
modify
()Create a pipeline builder from this pipeline in order to modify it.
node
(node, *[, missing])Get the pipeline node with the specified name.
node_input_connections
(node)Get the input wirings for a node.
nodes
()Get the nodes in the pipeline graph.
run
([nodes])Run the pipeline and obtain the return value(s) of one or more of its components.
run_all
(*nodes, **kwargs)Run all nodes in the pipeline, or all nodes required to fulfill the requested node, and return a mapping with the full pipeline state (the data attached to each node).
train
(data[, options])Trains the pipeline's trainable components (those implementing the
TrainableComponent
interface) on some training data.Attributes
Get the pipline configuration.
Get a hash of the pipeline's configuration to uniquely identify it for logging, version control, or other purposes.
Get the pipeline name (if configured).
Get the pipeline version (if configured).
- property config: PipelineConfig#
Get the pipline configuration.
Important
Do not modify the configuration returned, or it will become out-of-sync with the pipeline and likely not behave correctly.
- meta()#
Get the metadata (name, version, hash, etc.) for this pipeline without returning the whole config.
- Return type:
PipelineMeta
- node(node, *, missing='error')#
Get the pipeline node with the specified name. If passed a node, it returns the node or fails if the node is not a member of the pipeline.
- node_input_connections(node)#
Get the input wirings for a node.
- clone()#
Clone the pipeline, without its trained parameters.
- Returns:
A new pipeline with the same components and wiring, but fresh instances created by round-tripping the configuration.
- Return type:
- property config_hash: str#
Get a hash of the pipeline’s configuration to uniquely identify it for logging, version control, or other purposes.
The hash format and algorithm are not guaranteed, but is stable within a LensKit version. For the same version of LensKit and component code, the same configuration will produce the same hash, so long as there are no literal nodes. Literal nodes will usually hash consistently, but since literals other than basic JSON values are hashed by pickling, hash stability depends on the stability of the pickle bytestream.
In LensKit 2025.1, the configuration hash is computed by computing the JSON serialization of the pipeline configuration without a hash and returning the hex-encoded SHA256 hash of that configuration.
- static from_config(config)#
Reconstruct a pipeline from a serialized configuration.
- Parameters:
config (object) – The configuration object, as loaded from JSON, TOML, YAML, or similar. Will be validated into a
PipelineConfig
.- Returns:
The configured (but not trained) pipeline.
- Raises:
PipelineError – If there is a configuration error reconstructing the pipeline.
- Warns:
PipelineWarning – If the configuration is funny but usable; for example, the configuration includes a hash but the constructed pipeline does not have a matching hash.
- Return type:
- modify()#
Create a pipeline builder from this pipeline in order to modify it.
Pipelines cannot be modified in-place, but this method sets up a new builder that will create a modified copy of the pipeline. Unmodified component instances are reused as-is.
Note
Since default connections are applied in
build()
, the modifying builder does not have default connections.- Return type:
- train(data, options=None)#
Trains the pipeline’s trainable components (those implementing the
TrainableComponent
interface) on some training data.Random Number Generation
If
TrainingOptions.rng
is set and is not a generator or bit generator (i.e. it is a seed), then this method wraps the seed in aSeedSequence
and callsspawn()
to generate a distinct seed for each component in the pipeline.- Parameters:
data (Dataset) – The dataset to train on.
options (TrainingOptions | None) – The training options. If
None
, default options are used.
- Return type:
None
- run(nodes=None, /, **kwargs)#
Run the pipeline and obtain the return value(s) of one or more of its components. See Execution for details of the pipeline execution model.
- Parameters:
- Returns:
The pipeline result. If no nodes are supplied, this is the result of the default node. If a single node is supplied, it is the result of that node. If a tuple of nodes is supplied, it is a tuple of their results.
- Raises:
PipelineError – when there is a pipeline configuration error (e.g. a cycle).
ValueError – when one or more required inputs are missing.
TypeError – when one or more required inputs has an incompatible type.
other – exceptions thrown by components are passed through.
- Return type:
- run_all(*nodes, **kwargs)#
Run all nodes in the pipeline, or all nodes required to fulfill the requested node, and return a mapping with the full pipeline state (the data attached to each node). This is useful in cases where client code needs to be able to inspect the data at arbitrary steps of the pipeline. It differs from
run()
in two ways:It returns the data from all nodes as a mapping (dictionary-like object), not just the specified nodes as a tuple.
If no nodes are specified, it runs all nodes. This has the consequence of running nodes that are not required to fulfill the last node (such scenarios typically result from using
use_first_of()
).