Matrix Utilities¶
We have some matrix-related utilities, since matrices are used so heavily in recommendation algorithms.
Building Ratings Matrices¶
-
lenskit.matrix.
sparse_ratings
(ratings, scipy=False)¶ Convert a rating table to a sparse matrix of ratings.
Parameters: - ratings (pandas.DataFrame) – a data table of (user, item, rating) triples.
- scipy – if
True
, return a SciPy matrix instead ofCSR
.
Returns: a named tuple containing the sparse matrix, user index, and item index.
Return type:
-
class
lenskit.matrix.
RatingMatrix
¶ A rating matrix with associated indices.
-
matrix
¶ The rating matrix, with users on rows and items on columns.
Type: CSR or scipy.sparse.csr_matrix
-
users
¶ mapping from user IDs to row numbers.
Type: pandas.Index
-
items
¶ mapping from item IDs to column numbers.
Type: pandas.Index
-
Compressed Sparse Row Matrices¶
We use CSR-format sparse matrices in quite a few places. Since SciPy’s sparse matrices are not directly usable from Numba, we have implemented a Numba-compiled CSR representation that can be used from accelerated algorithm implementations.
-
lenskit.matrix.
csr_from_coo
(rows, cols, vals, shape=None)¶ Create a CSR matrix from data in COO format.
Parameters: - rows (array-like) – the row indices.
- cols (array-like) – the column indices.
- vals (array-like) – the data values; can be
None
. - shape (tuple) – the array shape, or
None
to infer from row & column indices.
-
lenskit.matrix.
csr_from_scipy
(mat, copy=True)¶ Convert a scipy sparse matrix to an internal CSR.
Parameters: - mat (scipy.sparse.spmatrix) – a SciPy sparse matrix.
- copy (bool) – if
False
, reuse the SciPy storage if possible.
Returns: a CSR matrix.
Return type:
-
lenskit.matrix.
csr_to_scipy
(mat)¶ Convert a CSR matrix to a SciPy
scipy.sparse.csr_matrix
.Parameters: mat (CSR) – A CSR matrix. Returns: A SciPy sparse matrix with the same data. It shares storage with matrix
.Return type: scipy.sparse.csr_matrix
-
lenskit.matrix.
csr_rowinds
(csr)¶ Get the row indices for a CSR matrix.
Parameters: csr (CSR) – a CSR matrix. Returns: the row index array for the CSR matrix. Return type: np.ndarray
-
lenskit.matrix.
csr_save
(csr: numba.jitclass.base.CSR, prefix=None)¶ Extract data needed to save a CSR matrix. This is intended to be used with, for example,
numpy.savez()
to save a matrix:np.savez_compressed('file.npz', **csr_save(csr))
The
prefix
allows multiple matrices to be saved in a single file:data = {} data.update(csr_save(m1, prefix='m1')) data.update(csr_save(m2, prefix='m2')) np.savez_compressed('file.npz', **data)
Parameters: Returns: a dictionary of data to save the matrix.
Return type:
-
lenskit.matrix.
csr_load
(data, prefix=None)¶ Rematerialize a CSR matrix from loaded data. The inverse of
csr_save()
.Parameters: - data (dict-like) – the input data.
- prefix (str) – the prefix for the data keys.
Returns: the matrix described by
data
.Return type:
-
class
lenskit.matrix.
CSR
(nrows, ncols, nnz, ptrs, inds, vals)¶ Simple compressed sparse row matrix. This is like
scipy.sparse.csr_matrix
, with a couple of useful differences:- It is a Numba jitclass, so it can be directly used from Numba-optimized functions.
- The value array is optional, for cases in which only the matrix structure is required.
- The value array, if present, is always double-precision.
You generally don’t want to create this class yourself. Instead, use one of the related utility functions.
-
rowptrs
¶ the row pointers.
Type: numpy.ndarray
-
colinds
¶ the column indices.
Type: numpy.ndarray
-
values
¶ the values
Type: numpy.ndarray