Data Utilities#

These are general-purpose data processing utilities.

Building Ratings Matrices#

lenskit.data.matrix.sparse_ratings(ratings, *, type='scipy', layout='csr', users=None, items=None)#

Convert a rating table to a sparse matrix of ratings.

Parameters:
  • ratings (pd.DataFrame) – A data table of (user, item, rating) triples.

  • type (Literal['scipy', 'spmatrix', 'torch', 'structure']) –

    The type of matrix to create. Can be any of the following:

  • layout (Literal['csr', 'coo']) – The matrix layout to use.

  • users (Optional[pd.Index[Any]]) – An index of user IDs.

  • items (Optional[pd.Index[Any]]) – An index of items IDs.

Returns:

a named tuple containing the sparse matrix, user index, and item index.

Return type:

RatingMatrix

class lenskit.data.matrix.RatingMatrix(matrix, users, items)#

Bases: NamedTuple, Generic[M]

A rating matrix with associated indices.

Parameters:
  • matrix (M)

  • users (pd.Index[Any])

  • items (pd.Index[Any])

matrix: M#

The rating matrix, with users on rows and items on columns.

users: pd.Index[Any]#

Mapping from user IDs to row numbers.

items: pd.Index[Any]#

Mapping from item IDs to column numbers.

class lenskit.data.matrix.CSRStructure(rowptrs, colinds, shape)#

Bases: NamedTuple

Representation of the compressed sparse row structure of a sparse matrix, without any data values.

Parameters:
rowptrs: ndarray#

Alias for field number 0

colinds: ndarray#

Alias for field number 1

shape: tuple[int, int]#

Alias for field number 2