Matrix Utilities

We have some matrix-related utilities, since matrices are used so heavily in recommendation algorithms.

Building Ratings Matrices

lenskit.matrix.sparse_ratings(ratings, scipy=False)

Convert a rating table to a sparse matrix of ratings.

Parameters
  • ratings (pandas.DataFrame) – a data table of (user, item, rating) triples.

  • scipy – if True, return a SciPy matrix instead of CSR.

Returns

a named tuple containing the sparse matrix, user index, and item index.

Return type

RatingMatrix

class lenskit.matrix.RatingMatrix

A rating matrix with associated indices.

matrix

The rating matrix, with users on rows and items on columns.

Type

CSR or scipy.sparse.csr_matrix

users

mapping from user IDs to row numbers.

Type

pandas.Index

items

mapping from item IDs to column numbers.

Type

pandas.Index

Compressed Sparse Row Matrices

We use CSR-format sparse matrices in quite a few places. Since SciPy’s sparse matrices are not directly usable from Numba, we have implemented a Numba-compiled CSR representation that can be used from accelerated algorithm implementations.

class lenskit.matrix.CSR(nrows=None, ncols=None, nnz=None, ptrs=None, inds=None, vals=None, N=None)

Simple compressed sparse row matrix. This is like scipy.sparse.csr_matrix, with a couple of useful differences:

  • It is backed by a Numba jitclass, so it can be directly used from Numba-optimized functions.

  • The value array is optional, for cases in which only the matrix structure is required.

  • The value array, if present, is always double-precision.

You generally don’t want to create this class yourself with the constructor. Instead, use one of its class methods.

If you need to pass an instance off to a Numba-compiled function, use N:

_some_numba_fun(csr.N)

We use the indirection between this and the Numba jitclass so that the main CSR implementation can be pickled, and so that we can have class and instance methods that are not compatible with jitclass but which are useful from interpreted code.

N

the Numba jitclass backing (has the same attributes and most methods).

Type

_CSR

nrows

the number of rows.

Type

int

ncols

the number of columns.

Type

int

nnz

the number of entries.

Type

int

rowptrs

the row pointers.

Type

numpy.ndarray

colinds

the column indices.

Type

numpy.ndarray

values

the values

Type

numpy.ndarray

classmethod empty(shape, row_nnzs, *, rpdtype=<class 'numpy.int32'>)

Create an empty CSR matrix.

Parameters
  • shape (tuple) – the array shape (rows,cols)

  • row_nnzs (array-like) – the number of nonzero entries for each row

filter_nnzs(filt)

Filter the values along the full NNZ axis.

Parameters

filt (ndarray) – a logical array of length nnz that indicates the values to keep.

Returns

The filtered sparse matrix.

Return type

CSR

classmethod from_coo(rows, cols, vals, shape=None, rpdtype=<class 'numpy.int32'>)

Create a CSR matrix from data in COO format.

Parameters
  • rows (array-like) – the row indices.

  • cols (array-like) – the column indices.

  • vals (array-like) – the data values; can be None.

  • shape (tuple) – the array shape, or None to infer from row & column indices.

classmethod from_scipy(mat, copy=True)

Convert a scipy sparse matrix to an internal CSR.

Parameters
Returns

a CSR matrix.

Return type

CSR

normalize_rows(normalization)

Normalize the rows of the matrix.

Note

The normalization ignores missing values instead of treating them as 0.

Note

This method is not available from Numba.

Parameters

normalization (str) –

The normalization to perform. Can be one of:

  • 'center' - center rows about the mean

  • 'unit' - convert rows to a unit vector

Returns

The normalization values for each row.

Return type

numpy.ndarray

row(row)

Return a row of this matrix as a dense ndarray.

Parameters

row (int) – the row index.

Returns

the row, with 0s in the place of missing values.

Return type

numpy.ndarray

row_cs(row)

Get the column indcies for the stored values of a row.

row_extent(row)

Get the extent of a row in the underlying column index and value arrays.

Parameters

row (int) – the row index.

Returns

(s, e), where the row occupies positions \([s, e)\) in the CSR data.

Return type

tuple

row_nnzs()

Get a vector of the number of nonzero entries in each row.

Note

This method is not available from Numba.

Returns

the number of nonzero entries in each row.

Return type

numpy.ndarray

row_vs(row)

Get the stored values of a row.

rowinds() → numpy.ndarray

Get the row indices from this array. Combined with colinds and values, this can form a COO-format sparse matrix.

Note

This method is not available from Numba.

sort_values()

Sort CSR rows in nonincreasing order by value.

Note

This method is not available from Numba.

subset_rows(begin, end)

Subset the rows in this matrix.

to_scipy()

Convert a CSR matrix to a SciPy scipy.sparse.csr_matrix. Avoids copying if possible.

Parameters

self (CSR) – A CSR matrix.

Returns

A SciPy sparse matrix with the same data.

Return type

scipy.sparse.csr_matrix

transpose(values=True)

Transpose a CSR matrix.

Note

This method is not available from Numba.

Parameters

values (bool) – whether to include the values in the transpose.

Returns

the transpose of this matrix (or, equivalently, this matrix in CSC format).

Return type

CSR

class lenskit.matrix._CSR(nrows, ncols, nnz, ptrs, inds, vals)

Internal implementation class for CSR. If you work with CSRs from Numba, you will use this.

Note that the values array is always present (unlike the Python shim), but is zero-length if no values are present. This eases Numba type-checking.