2024 Releases#

The 2024 release series is currently in development. No specific release date is set yet.

2024 will bring breaking changes to several LensKit APIs to improve ergonomics, correctness-by-default, and flexibility. It also adopts SPEC0, a standard for supported versions of scientific Python libraries, and changes the LensKit version number scheme to “SemCalVer”.

2024.1 (in progress)#

The first 2024 release is currently in-progress.

This document presents the highlights for this release. The full changelog for this release is available in the Git history and issue/PR milestone.

Significant Changes#

2024.1 brings substantial changes to LensKit.

  • PyTorch. LensKit now uses PyTorch to implement most of its algorithms, instead of Numba-accelerated NumPy code. Algorithms using PyTorch are:

  • Dataset. LensKit now provides an abstraction for training data instead of working with Pandas data frames directly, that allows components to reduce code duplication and recomputation, access data in multiple formats (Pandas, NumPy, and PyTorch), and provided standardized structures like mappings of user or item IDs to array indices. This also supersedes the old bespoke dataset loading support, with functions like load_movielens() to load standard datasets.

  • Many LensKit components (batch running, model training, etc.) now report progress with progress_api, and can be connected to TQDM or Enlighten.

  • Algorithms refactored to more consistent package locations, without the algorithms subpackage.

  • LensKit no longer has top-level exports (and is now in fact a namespace package). Classes and functions must be imported from appropriate subpackages.

  • The Popular recommender has been removed in favor of PopScore.

New Features (incremental)#

  • Added RBP top-N metric (⛙ 334).

  • Added command-line tool to fetch datasets (⛙ 347).

Model Behavior Changes#

Most models will exhibit some changes, hopefully mostly in performance, due to moving to PyTorch. There are some deliberate behavior changes in this new version, however, documented here.

  • ALS models only use Cholesky decomposition (previously selected with the erroneously-named method="lu" option); conjugate gradient and coordinate descent are no longer available. Cholesky decomposition is faster on PyTorch than it was with Numba, and is easier to maintain.

  • The default minimum similarity for UserUser is now \(10^{-6}\).

  • k-NN algorithms no longer support negative similarities; min_sim is clamped to be at least the smallest normal in 32-bit floating point (\(1.75 \times 10^{-38}\)).

  • The implicit bridge algorithms no longer look at rating values when they are present.

Bug Fixes#

Dependencies and Maintenance#

  • Bumped minimum supported dependencies as per SPEC0 (Python 3.10, NumPy 1.23, Pandas 1.5, SciPy 1.9).

  • Added support for Pandas 2 (⛙ 364) and Python 3.12.

  • Improved Apple testing to include vanilla Python and Apple Silicon (⛙ 366).

  • Updated build environment, dependency setup, taskrunning, and CI to more consistent and maintainable.

  • Removed legacy random code in favor of seedbank (⛙ 351).

  • Code is now auto-formatted with Ruff.