Utility Functions

Miscellaneous utility functions.

Matrix Utilities

We have some matrix-related utilities, since matrices are used so heavily in recommendation algorithms.

Building Ratings Matrices

lenskit.matrix.sparse_ratings(ratings, scipy=False)

Convert a rating table to a sparse matrix of ratings.

Parameters:
  • ratings (pandas.DataFrame) – a data table of (user, item, rating) triples.
  • scipy – if True, return a SciPy matrix instead of CSR.
Returns:

a named tuple containing the sparse matrix, user index, and item index.

Return type:

RatingMatrix

class lenskit.matrix.RatingMatrix

A rating matrix with associated indices.

matrix

The rating matrix, with users on rows and items on columns.

Type:CSR or scipy.sparse.csr_matrix
users

mapping from user IDs to row numbers.

Type:pandas.Index
items

mapping from item IDs to column numbers.

Type:pandas.Index

Compressed Sparse Row Matrices

We use CSR-format sparse matrices in quite a few places. Since SciPy’s sparse matrices are not directly usable from Numba, we have implemented a Numba-compiled CSR representation that can be used from accelerated algorithm implementations.

lenskit.matrix.csr_from_coo(rows, cols, vals, shape=None)

Create a CSR matrix from data in COO format.

Parameters:
  • rows (array-like) – the row indices.
  • cols (array-like) – the column indices.
  • vals (array-like) – the data values; can be None.
  • shape (tuple) – the array shape, or None to infer from row & column indices.
lenskit.matrix.csr_from_scipy(mat, copy=True)

Convert a scipy sparse matrix to an internal CSR.

Parameters:
Returns:

a CSR matrix.

Return type:

CSR

lenskit.matrix.csr_to_scipy(mat)

Convert a CSR matrix to a SciPy scipy.sparse.csr_matrix.

Parameters:mat (CSR) – A CSR matrix.
Returns:A SciPy sparse matrix with the same data. It shares storage with matrix.
Return type:scipy.sparse.csr_matrix
lenskit.matrix.csr_rowinds(csr)

Get the row indices for a CSR matrix.

Parameters:csr (CSR) – a CSR matrix.
Returns:the row index array for the CSR matrix.
Return type:np.ndarray
lenskit.matrix.csr_save(csr: numba.jitclass.base.CSR, prefix=None)

Extract data needed to save a CSR matrix. This is intended to be used with, for example, :py:fun:`numpy.savez` to save a matrix:

np.savez_compressed('file.npz', **csr_save(csr))

The prefix allows multiple matrices to be saved in a single file:

data = {}
data.update(csr_save(m1, prefix='m1'))
data.update(csr_save(m2, prefix='m2'))
np.savez_compressed('file.npz', **data)
Parameters:
  • csr (CSR) – the matrix to save.
  • prefix (str) – the prefix for the data keys.
Returns:

a dictionary of data to save the matrix.

Return type:

dict

lenskit.matrix.csr_load(data, prefix=None)

Rematerialize a CSR matrix from loaded data. The inverse of :py:fun:`csr_save`.

Parameters:
  • data (dict-like) – the input data.
  • prefix (str) – the prefix for the data keys.
Returns:

the matrix described by data.

Return type:

CSR

class lenskit.matrix.CSR(nrows, ncols, nnz, ptrs, inds, vals)

Simple compressed sparse row matrix. This is like scipy.sparse.csr_matrix, with a couple of useful differences:

  • It is a Numba jitclass, so it can be directly used from Numba-optimized functions.
  • The value array is optional, for cases in which only the matrix structure is required.
  • The value array, if present, is always double-precision.

You generally don’t want to create this class yourself. Instead, use one of the related utility functions.

nrows

the number of rows.

Type:int
ncols

the number of columns.

Type:int
nnz

the number of entries.

Type:int
rowptrs

the row pointers.

Type:numpy.ndarray
colinds

the column indices.

Type:numpy.ndarray
values

the values

Type:numpy.ndarray