2024 Releases#
The 2024 release series is currently in development. No specific release date is set yet.
2024 will bring breaking changes to several LensKit APIs to improve ergonomics, correctness-by-default, and flexibility. It also adopts SPEC0, a standard for supported versions of scientific Python libraries, and changes the LensKit version number scheme to “SemCalVer”.
2024.1 (in progress)#
The first 2024 release is currently in-progress.
This document presents the highlights for this release. The full changelog for this release is available in the Git history and issue/PR milestone.
Significant Changes#
2024.1 brings substantial changes to LensKit.
PyTorch. LensKit now uses PyTorch to implement most of its algorithms, instead of Numba-accelerated NumPy code. Algorithms using PyTorch are:
Dataset
. LensKit now provides an abstraction for training data instead of working with Pandas data frames directly, that allows components to reduce code duplication and recomputation, access data in multiple formats (Pandas, NumPy, and PyTorch), and provided standardized structures like mappings of user or item IDs to array indices. This also supersedes the old bespoke dataset loading support, with functions likeload_movielens()
to load standard datasets.Many LensKit components (batch running, model training, etc.) now report progress with
progress_api
, and can be connected to TQDM or Enlighten.Algorithms refactored to more consistent package locations, without the
algorithms
subpackage.LensKit no longer has top-level exports (and is now in fact a namespace package). Classes and functions must be imported from appropriate subpackages.
The
Popular
recommender has been removed in favor ofPopScore
.
New Features (incremental)#
Model Behavior Changes#
Most models will exhibit some changes, hopefully mostly in performance, due to moving to PyTorch. There are some deliberate behavior changes in this new version, however, documented here.
ALS models only use Cholesky decomposition (previously selected with the erroneously-named
method="lu"
option); conjugate gradient and coordinate descent are no longer available. Cholesky decomposition is faster on PyTorch than it was with Numba, and is easier to maintain.The default minimum similarity for
UserUser
is now \(10^{-6}\).k-NN algorithms no longer support negative similarities;
min_sim
is clamped to be at least the smallest normal in 32-bit floating point (\(1.75 \times 10^{-38}\)).The
implicit
bridge algorithms no longer look at rating values when they are present.
Bug Fixes#
lenskit.util.clone()
now properly clones tuples (⛙ 358).Corrected documentation errors for
recall()
andhit()
(⛙ 369 by @lukas-wegmeth).
Dependencies and Maintenance#
Bumped minimum supported dependencies as per SPEC0 (Python 3.10, NumPy 1.23, Pandas 1.5, SciPy 1.9).
Added support for Pandas 2 (⛙ 364) and Python 3.12.
Improved Apple testing to include vanilla Python and Apple Silicon (⛙ 366).
Updated build environment, dependency setup, taskrunning, and CI to more consistent and maintainable.
Code is now auto-formatted with Ruff.