Utility Functions

Miscellaneous

Miscellaneous utility functions.

lenskit.util.clone(algo)

Clone an algorithm, but not its fitted data. This is like scikit.base.clone(), but may not work on arbitrary SciKit estimators. LensKit algorithms are compatible with SciKit clone, however, so feel free to use that if you need more general capabilities.

This function is somewhat derived from the SciKit one.

>>> from lenskit.algorithms.basic import Bias
>>> orig = Bias()
>>> copy = clone(orig)
>>> copy is orig
False
>>> copy.damping == orig.damping
True
lenskit.util.fspath(path)

Backport of os.fspath() function for Python 3.5.

lenskit.util.load_ml_ratings(path='ml-latest-small')

Load the ratings from a modern MovieLens data set (ML-20M or one of the ‘latest’ data sets).

>>> load_ml_ratings().head()
    user item rating  timestamp
0   1      31    2.5 1260759144
1   1    1029    3.0 1260759179
2   1    1061    3.0 1260759182
3   1    1129    2.0 1260759185
4   1    1172    4.0 1260759205
Parameters

path – The path where the MovieLens data is unpacked.

Returns

The rating data, with user and item columns named properly for LensKit.

Return type

pandas.DataFrame

lenskit.util.read_df_detect(path)

Read a Pandas data frame, auto-detecting the file format based on filename suffix. The following file types are supported:

CSV

File has suffix .csv, read with pandas.read_csv().

Parquet

File has suffix .parquet, .parq, or .pq, read with pandas.read_parquet().

lenskit.util.write_parquet(path, frame, append=False)

Write a Parquet file.

Parameters
  • path (pathlib.Path) – The path of the Parquet file to write.

  • frame (pandas.DataFrame) – The data to write.

  • append (bool) – Whether to append to the file or overwrite it.