lenskit.pipeline.PipelineBuilder#

class lenskit.pipeline.PipelineBuilder(name=None, version=None)#

Bases: object

Builder for LensKit recommendation pipelines. Pipelines are the core abstraction for using LensKit models and other components to produce recommendations in a useful way. They allow you to wire together components in (mostly) abitrary graphs, train them on data, and serialize the resulting pipelines to disk for use elsewhere.

The builder configures and builds pipelines that can then be run. If you have a scoring model and just want to generate recommenations with a default setup and minimal configuration, see topn_pipeline() or RecPipelineBuilder.

Parameters:
  • name (str | None) – A name for the pipeline.

  • version (str | None) – A numeric version for the pipeline.

Stability:
Caller (see Stability Levels).
__init__(name=None, version=None)#
Parameters:
  • name (str | None)

  • version (str | None)

Methods

__init__([name, version])

add_component(name, comp[, config])

Add a component and connect it into the graph.

alias(alias, node)

Create an alias for a node.

build()

Build the pipeline.

build_config(*[, include_hash])

Get this pipeline's configuration for serialization.

clear_inputs(node)

Remove input wirings for a node.

clone()

Clone the pipeline builder.

config_hash()

Get a hash of the pipeline's configuration to uniquely identify it for logging, version control, or other purposes.

connect(obj, **inputs)

Provide additional input connections for a component that has already been added.

create_input(name, *types)

Create an input node for the pipeline.

default_component(node)

Set the default node for the pipeline.

default_connection(name, node)

Set the default wiring for a component input.

from_config(config)

Reconstruct a pipeline builder from a serialized configuration.

from_pipeline(pipeline)

Create a builder initialized with a pipeline's internal state.

literal(value, *[, name])

Create a literal node (a node with a fixed value).

meta(*[, include_hash])

Get the metadata (name, version, hash, etc.) for this pipeline without returning the whole config.

node(node, *[, missing])

Get the pipeline node with the specified name.

nodes()

Get the nodes in the pipeline graph.

replace_component(name, comp[, config])

Replace a component in the graph.

use_first_of(name, primary, fallback)

Ergonomic method to create a new node that returns the result of its input if it is provided and not None, and otherwise returns the result of fallback.

validate()

Check the built pipeline for errors.

Attributes

name

The pipeline name.

version

The pipeline version string.

name: str | None = None#

The pipeline name.

version: str | None = None#

The pipeline version string.

classmethod from_pipeline(pipeline)#

Create a builder initialized with a pipeline’s internal state. See Pipeline.modify() for details — that is the main entry point, and this method exists to be the implementation of that method.

Parameters:

pipeline (Pipeline)

Return type:

PipelineBuilder

meta(*, include_hash=True)#

Get the metadata (name, version, hash, etc.) for this pipeline without returning the whole config.

Parameters:

include_hash (bool) – Whether to include a configuration hash in the metadata.

Return type:

PipelineMeta

nodes()#

Get the nodes in the pipeline graph.

Return type:

list[Node[object]]

node(node, *, missing='error')#

Get the pipeline node with the specified name. If passed a node, it returns the node or fails if the node is not a member of the pipeline.

Parameters:
  • node (str | Node[Any]) – The name of the pipeline node to look up, or a node to check for membership.

  • missing (Literal['error', 'none'] | None)

Returns:

The pipeline node, if it exists.

Raises:

KeyError – The specified node does not exist.

Return type:

Node[object] | None

create_input(name, *types)#

Create an input node for the pipeline. Pipelines expect their inputs to be provided when they are run.

Parameters:
  • name (str) – The name of the input. The name must be unique in the pipeline (among both components and inputs).

  • types (type[T] | UnionType | None) – The allowable types of the input; input data can be of any specified type. If None is among the allowed types, the input can be omitted.

Returns:

A pipeline node representing this input.

Raises:

ValueError – a node with the specified name already exists.

Return type:

Node[T]

literal(value, *, name=None)#

Create a literal node (a node with a fixed value).

Note

Literal nodes cannot be serialized witih get_config() or save_config().

Parameters:
  • value (T)

  • name (str | None)

Return type:

LiteralNode[T]

default_connection(name, node)#

Set the default wiring for a component input. Components that declare an input parameter with the specified name but no configured input will be wired to this node.

This is intended to be used for things like wiring up user parameters to semi-automatically receive the target user’s identity and history.

Important

Defaults are a feature of the builder only, and are resolved in build(). They are not included in serialized configuration or resulting pipeline.

Parameters:
  • name (str) – The name of the parameter to set a default for.

  • node (Node[Any] | object) – The node or literal value to wire to this parameter.

Return type:

None

default_component(node)#

Set the default node for the pipeline. If Pipeline.run() is called without a node, then it will run this node (and all of its dependencies).

Parameters:

node (str | Node[Any])

Return type:

None

alias(alias, node)#

Create an alias for a node. After aliasing, the node can be retrieved from node() using either its original name or its alias.

Parameters:
  • alias (str) – The alias to add to the node.

  • node (Node[Any] | str) – The node (or node name) to alias.

Raises:

ValueError – if the alias is already used as an alias or node name.

Return type:

None

add_component(name, comp, config=None, /, **inputs)#

Add a component and connect it into the graph.

Parameters:
  • name (str) – The name of the component in the pipeline. The name must be unique in the pipeline (among both components and inputs).

  • cls – A component class.

  • config (CFG | None) – The configuration object for the component class.

  • instance – A raw function or pre-instantiated component.

  • inputs (Node[Any] | object) – The component’s input wiring. See Connections for details.

  • comp (ComponentConstructor[CFG, ND] | Component[ND, ...] | Callable[[...], ND])

Returns:

The node representing this component in the pipeline.

Return type:

Node[ND]

replace_component(name, comp, config=None, /, **inputs)#

Replace a component in the graph. The new component must have a type that is compatible with the old component. Both input and output connections are retained, except for those overridden with with keyword arguments.

Parameters:
  • name (str | Node[ND]) – The name or node to replace.

  • comp (ComponentConstructor[CFG, ND] | Component[ND, ...] | Callable[[...], ND]) – The component or constructor to use instead of the current node’s component.

  • config (CFG | None) – A configuration for the component (if passed as a class or constructor).

  • inputs (Node[Any] | object) – New input wiring(s) for the new component.

Return type:

Node[ND]

connect(obj, **inputs)#

Provide additional input connections for a component that has already been added. See Connections for details.

Parameters:
  • obj (str | Node[Any]) – The name or node of the component to wire.

  • inputs (Node[Any] | str | object) – The component’s input wiring. For each keyword argument in the component’s function signature, that argument can be provided here with an input that the pipeline will provide to that argument of the component when the pipeline is run.

clear_inputs(node)#

Remove input wirings for a node.

Parameters:

node (str | Node[Any]) – The node whose input wiring should be removed.

validate()#

Check the built pipeline for errors.

clone()#

Clone the pipeline builder. The resulting builder starts as a copy of this builder, and any subsequent modifications only the copy to which they are applied.

Return type:

PipelineBuilder

build_config(*, include_hash=True)#

Get this pipeline’s configuration for serialization. The configuration consists of all inputs and components along with their configurations and input connections. It can be serialized to disk (in JSON, YAML, or a similar format) to save a pipeline.

The configuration does not include any trained parameter values, although the configuration may include things such as paths to checkpoints to load such parameters, depending on the design of the components in the pipeline.

Note

Literal nodes (from literal(), or literal values wired to inputs) cannot be serialized, and this method will fail if they are present in the pipeline.

Parameters:

include_hash (bool)

Return type:

PipelineConfig

config_hash()#

Get a hash of the pipeline’s configuration to uniquely identify it for logging, version control, or other purposes.

The hash format and algorithm are not guaranteed, but hashes are stable within a LensKit version. For the same version of LensKit and component code, the same configuration will produce the same hash, so long as there are no literal nodes. Literal nodes will usually hash consistently, but since literals other than basic JSON values are hashed by pickling, hash stability depends on the stability of the pickle bytestream.

In LensKit 2025.1, the configuration hash is computed by computing the JSON serialization of the pipeline configuration without a hash and returning the hex-encoded SHA256 hash of that configuration.

Return type:

str

classmethod from_config(config)#

Reconstruct a pipeline builder from a serialized configuration.

Parameters:

config (object) – The configuration object, as loaded from JSON, TOML, YAML, or similar. Will be validated into a PipelineConfig.

Returns:

The configured (but not trained) pipeline.

Raises:

PipelineError – If there is a configuration error reconstructing the pipeline.

Warns:

PipelineWarning – If the configuration is funny but usable; for example, the configuration includes a hash but the constructed pipeline does not have a matching hash.

Return type:

Self

use_first_of(name, primary, fallback)#

Ergonomic method to create a new node that returns the result of its input if it is provided and not None, and otherwise returns the result of fallback. This method is used for things like filling in optional pipeline inputs. For example, if you want the pipeline to take candidate items through an items input, but look them up from the user’s history and the training data if items is not supplied, you would do:

pipe = Pipeline()
# allow candidate items to be optionally specified
items = pipe.create_input('items', list[EntityId], None)
# find candidates from the training data (optional)
lookup_candidates = pipe.add_component(
    'select-candidates', UnratedTrainingItemsCandidateSelector(),
    user=history,
)
# if the client provided items as a pipeline input, use those; otherwise
# use the candidate selector we just configured.
candidates = pipe.use_first_of('candidates', items, lookup_candidates)

Note

This method does not distinguish between an input being unspecified and explicitly specified as None.

Note

This method does not implement item-level fallbacks, only fallbacks at the level of entire results. For item-level score fallbacks, see FallbackScorer.

Note

If one of the fallback elements is a component A that depends on another component or input B, and B is missing or returns None such that A would usually fail, then A will be skipped and the fallback will move on to the next node. This works with arbitrarily-deep transitive chains.

Parameters:
  • name (str) – The name of the node.

  • primary (Node[T | None]) – The node to use as the primary input, if it is available.

  • fallback (Node[T]) – The node to use if the primary input does not provide a value.

Return type:

Node[T]

build()#

Build the pipeline.

Return type:

Pipeline