Utility Functions
These utility functions are useful for data processing.
Miscellaneous
Miscellaneous utility functions.
- lenskit.util.log_to_stderr(level=20)
Set up the logging infrastructure to show log output on
sys.stderr
, where it will appear in the IPython message log.
- lenskit.util.log_to_notebook(level=20)
Set up the logging infrastructure to show log output in the Jupyter notebook.
- class lenskit.util.Stopwatch(start=True)
Bases:
object
Timer class for recording elapsed wall time in operations.
- lenskit.util.rng(spec=None, *, legacy=False)
Get a random number generator. This is similar to
sklearn.utils.check_random_seed()
, but it usually returns anumpy.random.Generator
instead.Warning
This method is deprecated. Use
seedbank.numpy_rng()
instead.- Parameters
spec –
The spec for this RNG. Can be any of the following types:
legacy (bool) – If
True
, returnnumpy.random.mtrand.RandomState
instead of a new-stylenumpy.random.Generator
.
- Returns
A random number generator.
- Return type
- lenskit.util.init_rng(seed, *keys, propagate=True)
Initialize the random infrastructure with a seed. This function should generally be called very early in the setup.
Warning
This method is deprecated. Use
seedbank.initialize()
instead.- Parameters
seed (int or numpy.random.SeedSequence) – The random seed to initialize with.
keys – Additional keys, to use as a
spawn_key
on NumPy 1.17. Passed toderive_seed()
.propagate (bool) –
If
True
, initialize other RNG infrastructure. This currently initializes:np.random.seed()
If
propagate=False
, LensKit is still fully seeded — no component included with LensKit uses any of the global RNGs, they all use RNGs seeded with the specified seed.
- Returns
The random seed.
- lenskit.util.derivable_rng(spec, *, legacy=False)
Get a derivable RNG, for use cases where the code needs to be able to reproducibly derive sub-RNGs for different keys, such as user IDs.
- Parameters
spec –
Any value supported by the seed parameter of
seedbank.numpy_rng()
, in addition to the following values:the string
'user'
a tuple of the form (
seed
,'user'
)
Either of these forms will cause the returned function to re-derive new RNGs.
- Returns
A function taking one (or more) key values, like
derive_seed()
, and returning a random number generator (the type of which is determined by thelegacy
parameter).- Return type
function
- lenskit.util.proc_count(core_div=2, max_default=None, level=0)
Get the number of desired jobs for multiprocessing operations. This does not affect Numba or MKL multithreading.
This count can come from a number of sources:
The
LK_NUM_PROCS
environment variableThe number of CPUs, divided by
core_div
(default 2)
- Parameters
core_div (int or None) – The divisor to scale down the number of cores;
None
to turn off core-based fallback.max_default – The maximum number of processes to use if the environment variable is not configured.
level – The process nesting level. 0 is the outermost level of parallelism; subsequent levels control nesting. Levels deeper than 1 are rare, and it isn’t expected that callers actually have an accurate idea of the threading nesting, just that they are configuring a child. If the process count is unconfigured, then level 1 will use
core_div
, and deeper levels will use 1.
- Returns
The number of jobs desired.
- Return type
- lenskit.util.clone(algo)
Clone an algorithm, but not its fitted data. This is like
sklearn.base.clone()
, but may not work on arbitrary SciKit estimators. LensKit algorithms are compatible with SciKit clone, however, so feel free to use that if you need more general capabilities.This function is somewhat derived from the SciKit one.
>>> from lenskit.algorithms.bias import Bias >>> orig = Bias() >>> copy = clone(orig) >>> copy is orig False >>> copy.damping == orig.damping True