lenskit.pipeline.PipelineBuilder#
- class lenskit.pipeline.PipelineBuilder(name=None, version=None)#
Bases:
object
Builder for LensKit recommendation pipelines. Pipelines are the core abstraction for using LensKit models and other components to produce recommendations in a useful way. They allow you to wire together components in (mostly) abitrary graphs, train them on data, and serialize the resulting pipelines to disk for use elsewhere.
The builder configures and builds pipelines that can then be run. If you have a scoring model and just want to generate recommenations with a default setup and minimal configuration, see
topn_pipeline()
orRecPipelineBuilder
.- Parameters:
- Stability:
- Caller (see Stability Levels).
Methods
__init__
([name, version])add_component
(name, comp[, config])Add a component and connect it into the graph.
alias
(alias, node)Create an alias for a node.
build
()Build the pipeline.
build_config
(*[, include_hash])Get this pipeline's configuration for serialization.
clear_inputs
(node)Remove input wirings for a node.
clone
()Clone the pipeline builder.
Get a hash of the pipeline's configuration to uniquely identify it for logging, version control, or other purposes.
connect
(obj, **inputs)Provide additional input connections for a component that has already been added.
create_input
(name, *types)Create an input node for the pipeline.
default_component
(node)Set the default node for the pipeline.
default_connection
(name, node)Set the default wiring for a component input.
from_config
(config)Reconstruct a pipeline builder from a serialized configuration.
from_pipeline
(pipeline)Create a builder initialized with a pipeline's internal state.
literal
(value, *[, name])Create a literal node (a node with a fixed value).
meta
(*[, include_hash])Get the metadata (name, version, hash, etc.) for this pipeline without returning the whole config.
node
(node, *[, missing])Get the pipeline node with the specified name.
nodes
()Get the nodes in the pipeline graph.
replace_component
(name, comp[, config])Replace a component in the graph.
use_first_of
(name, primary, fallback)Ergonomic method to create a new node that returns the result of its
input
if it is provided and notNone
, and otherwise returns the result offallback
.validate
()Check the built pipeline for errors.
Attributes
- classmethod from_pipeline(pipeline)#
Create a builder initialized with a pipeline’s internal state. See
Pipeline.modify()
for details — that is the main entry point, and this method exists to be the implementation of that method.- Parameters:
pipeline (Pipeline)
- Return type:
- meta(*, include_hash=True)#
Get the metadata (name, version, hash, etc.) for this pipeline without returning the whole config.
- Parameters:
include_hash (bool) – Whether to include a configuration hash in the metadata.
- Return type:
PipelineMeta
- node(node, *, missing='error')#
Get the pipeline node with the specified name. If passed a node, it returns the node or fails if the node is not a member of the pipeline.
- create_input(name, *types)#
Create an input node for the pipeline. Pipelines expect their inputs to be provided when they are run.
- Parameters:
- Returns:
A pipeline node representing this input.
- Raises:
ValueError – a node with the specified
name
already exists.- Return type:
Node[T]
- literal(value, *, name=None)#
Create a literal node (a node with a fixed value).
Note
Literal nodes cannot be serialized witih
get_config()
orsave_config()
.- Parameters:
value (T)
name (str | None)
- Return type:
LiteralNode[T]
- default_connection(name, node)#
Set the default wiring for a component input. Components that declare an input parameter with the specified
name
but no configured input will be wired to this node.This is intended to be used for things like wiring up user parameters to semi-automatically receive the target user’s identity and history.
Important
Defaults are a feature of the builder only, and are resolved in
build()
. They are not included in serialized configuration or resulting pipeline.
- default_component(node)#
Set the default node for the pipeline. If
Pipeline.run()
is called without a node, then it will run this node (and all of its dependencies).
- alias(alias, node)#
Create an alias for a node. After aliasing, the node can be retrieved from
node()
using either its original name or its alias.- Parameters:
- Raises:
ValueError – if the alias is already used as an alias or node name.
- Return type:
None
- add_component(name, comp, config=None, /, **inputs)#
Add a component and connect it into the graph.
- Parameters:
name (str) – The name of the component in the pipeline. The name must be unique in the pipeline (among both components and inputs).
cls – A component class.
config (CFG | None) – The configuration object for the component class.
instance – A raw function or pre-instantiated component.
inputs (Node[Any] | object) – The component’s input wiring. See Connections for details.
comp (ComponentConstructor[CFG, ND] | Component[ND, ...] | Callable[[...], ND])
- Returns:
The node representing this component in the pipeline.
- Return type:
Node[ND]
- replace_component(name, comp, config=None, /, **inputs)#
Replace a component in the graph. The new component must have a type that is compatible with the old component. Both input and output connections are retained, except for those overridden with with keyword arguments.
- Parameters:
comp (ComponentConstructor[CFG, ND] | Component[ND, ...] | Callable[[...], ND]) – The component or constructor to use instead of the current node’s component.
config (CFG | None) – A configuration for the component (if passed as a class or constructor).
inputs (Node[Any] | object) – New input wiring(s) for the new component.
- Return type:
Node[ND]
- connect(obj, **inputs)#
Provide additional input connections for a component that has already been added. See Connections for details.
- Parameters:
obj (str | Node[Any]) – The name or node of the component to wire.
inputs (Node[Any] | str | object) – The component’s input wiring. For each keyword argument in the component’s function signature, that argument can be provided here with an input that the pipeline will provide to that argument of the component when the pipeline is run.
- clear_inputs(node)#
Remove input wirings for a node.
- validate()#
Check the built pipeline for errors.
- clone()#
Clone the pipeline builder. The resulting builder starts as a copy of this builder, and any subsequent modifications only the copy to which they are applied.
- Return type:
- build_config(*, include_hash=True)#
Get this pipeline’s configuration for serialization. The configuration consists of all inputs and components along with their configurations and input connections. It can be serialized to disk (in JSON, YAML, or a similar format) to save a pipeline.
The configuration does not include any trained parameter values, although the configuration may include things such as paths to checkpoints to load such parameters, depending on the design of the components in the pipeline.
Note
Literal nodes (from
literal()
, or literal values wired to inputs) cannot be serialized, and this method will fail if they are present in the pipeline.- Parameters:
include_hash (bool)
- Return type:
- config_hash()#
Get a hash of the pipeline’s configuration to uniquely identify it for logging, version control, or other purposes.
The hash format and algorithm are not guaranteed, but hashes are stable within a LensKit version. For the same version of LensKit and component code, the same configuration will produce the same hash, so long as there are no literal nodes. Literal nodes will usually hash consistently, but since literals other than basic JSON values are hashed by pickling, hash stability depends on the stability of the pickle bytestream.
In LensKit 2025.1, the configuration hash is computed by computing the JSON serialization of the pipeline configuration without a hash and returning the hex-encoded SHA256 hash of that configuration.
- Return type:
- classmethod from_config(config)#
Reconstruct a pipeline builder from a serialized configuration.
- Parameters:
config (object) – The configuration object, as loaded from JSON, TOML, YAML, or similar. Will be validated into a
PipelineConfig
.- Returns:
The configured (but not trained) pipeline.
- Raises:
PipelineError – If there is a configuration error reconstructing the pipeline.
- Warns:
PipelineWarning – If the configuration is funny but usable; for example, the configuration includes a hash but the constructed pipeline does not have a matching hash.
- Return type:
- use_first_of(name, primary, fallback)#
Ergonomic method to create a new node that returns the result of its
input
if it is provided and notNone
, and otherwise returns the result offallback
. This method is used for things like filling in optional pipeline inputs. For example, if you want the pipeline to take candidate items through anitems
input, but look them up from the user’s history and the training data ifitems
is not supplied, you would do:pipe = Pipeline() # allow candidate items to be optionally specified items = pipe.create_input('items', list[EntityId], None) # find candidates from the training data (optional) lookup_candidates = pipe.add_component( 'select-candidates', UnratedTrainingItemsCandidateSelector(), user=history, ) # if the client provided items as a pipeline input, use those; otherwise # use the candidate selector we just configured. candidates = pipe.use_first_of('candidates', items, lookup_candidates)
Note
This method does not distinguish between an input being unspecified and explicitly specified as
None
.Note
This method does not implement item-level fallbacks, only fallbacks at the level of entire results. For item-level score fallbacks, see
FallbackScorer
.Note
If one of the fallback elements is a component
A
that depends on another component or inputB
, andB
is missing or returnsNone
such thatA
would usually fail, thenA
will be skipped and the fallback will move on to the next node. This works with arbitrarily-deep transitive chains.