Random Number Generation

Current best practice for reproducible science in machine learning — including, but not limited to, recommender systems — is to use fixed random seeds so results can be reproduced precisely. This is useful both for reproducing the results themselves and for debugging.

To test for seed sensitivity, the entire experiment can be re-run with a different random seed and the conclusions compared.

LensKit is built to support this experimental design, making consistent use of configurable random number generators throughout its algorithm implementations. When run against NumPy 1.17 or later, it uses the new numpy.random.Generator and numpy.random.SeedSequence facilities to provide consistent random number generation and initialization. LensKit is compatible with older versions of NumPy, but the RNG reproducibility logic will not fully function, and some functions will not work.

Note

For fully reproducible research, including random seeds and the use thereof, make sure that you are running on the same platform with the same verions of all packages (particularly LensKit, NumPy, SciPy, Pandas, and related packages), and are using at least NumPy 1.17. LensKit manages state for older versions of NumPy on a best-effort basis.

Developers using LensKit will be primarily intrested in the init_rng() function, so they can initialize LensKit’s random seed. LensKit components using randomization also take an rng option, usually in their constructor, to set the seed on a per-operation basis; if the script is straightforward and performs LensKit operations in a deterministic order (e.g. does not train multiple models in parallel), initializing the global RNG is sufficient.

Developers writing new LensKit algorithms that use randomization will also need pay attention to the rng() function, along with derivable_rng() and derive_seed() if predictions or recommendations, not just model training, requires random values. Their constructors should take a parameter rng_spec to specify the RNG initialization.

Seeds

LensKit random number generation starts from a global root seed, accessible with get_root_seed(). This seed can be initialized with init_rng().

lenskit.util.random.init_rng(seed, *keys, propagate=True)

Initialize the random infrastructure with a seed. This function should generally be called very early in the setup.

Parameters
  • seed (int or numpy.random.SeedSequence) – The random seed to initialize with.

  • keys – Additional keys, to use as a spawn_key on NumPy 1.17. Passed to derive_seed().

  • propagate (bool) –

    If True, initialize other RNG infrastructure. This currently initializes:

    If propagate=False, LensKit is still fully seeded — no component included with LensKit uses any of the global RNGs, they all use RNGs seeded with the specified seed.

Returns

The random seed.

lenskit.util.random.derive_seed(*keys, base=None, none_on_old_numpy=False)

Derive a seed from the root seed, optionally with additional seed keys.

Parameters
  • keys (list of int or str) – Additional components to add to the spawn key for reproducible derivation. If unspecified, the seed’s internal counter is incremented (by calling numpy.random.SeedSequence.spawn()).

  • base (numpy.random.SeedSequence) – The base seed to use. If None, uses the root seed.

  • none_on_old_numpy (bool) – If True, return None instead of raising NotImplementedError if running on an old version of NumPy.

lenskit.util.random.get_root_seed()

Get the root seed.

Returns

The LensKit root seed.

Return type

numpy.random.SeedSequence

Random Number Generators

These functions create actual RNGs from the LensKit global seed or a user-provided seed. They can produce both new-style numpy.random.Generator RNGs and legacy numpy.random.mtrand.RandomState; the latter is needed because some libraries, such as Pandas and scikit-learn, do not yet know what to do with a new-style RNG.

lenskit.util.random.rng(spec=None, *, legacy=False)

Get a random number generator. This is similar to sklearn.utils.check_random_seed(), but it usually returns a numpy.random.Generator instead.

Parameters
Returns

A random number generator.

Return type

numpy.random.Generator

lenskit.util.random.derivable_rng(spec, *, legacy=False)

Get a derivable RNG, for use cases where the code needs to be able to reproducibly derive sub-RNGs for different keys, such as user IDs.

Parameters

spec

Any value supported by the seed parameter of rng(), in addition to the following values:

  • the string 'user'

  • a tuple of the form (seed, 'user')

Either of these forms will cause the returned function to re-derive new RNGs.

Returns

A function taking one (or more) key values, like derive_seed(), and returning a random number generator (the type of which is determined by the legacy parameter).

Return type

function