2025 Releases#
2025 brought breaking changes to across the LensKit APIs to improve ergonomics, correctness-by-default, and flexibility. It also adopts SPEC0, a standard for supported versions of scientific Python libraries, and changes the LensKit version number scheme to “SemCalVer”. See Migrating from LensKit 0.x for information on how to upgrade your code.
2025.3.0 (in progress)#
Replace broken “softmax” sampler with a proper stochastic sampler (⛙ 667).
sample_negatives()
now accepts"popular"
as an alias for"popularity"
.lenskit.basic
now exports the configuration classes for basic algorithms (⛙ 672).Several bug fixes for logging in niche setups (including ray clusters) (⛙ 673).
Added Rust acceleration for some data processing operations and kNN models.
Added versioning to the native data format, documented data format compatibility, and added compatibility tests.
Added compressed sparse row extension types for Arrow, and use them in the LensKit native format (as well as Python/Rust data interchange) to more reliably handle CSR matrix data in Arrow (previously, we had to carry the matrix width or row dimension in side information; it is now embedded into the Arrow metadata).
Component Changes#
ItemKNNScorer
andUserkNNScorer
are rewritten to use Rust acceleration, along with changes to its internal data representation to use Arrow instead of SciPy. This also fixes a segfault with very large similarity matrices.Note
The model parameters of the KNN scorers have changed. They are no longer suffixed with
_
, and the similarity matrix is a PyArrow list array. Code that was directly examining internal elements will need to change.
2025.2.0#
Some small quality-of-life improvements (and removing invalid API compat).
Add
lenskit.pipeline.PipelineCache
to share components between pipelines (⛙ 605).Only warn once for users without test data in bulk analysis (⛙ 664, 🐞 663).
Allow a Pandas data frame to be passed as the test data to the batch recommender (⛙ 660).
Note
This removes extra keyword arguments from the convenience
batch.recommend
, etc. functions that were leftovers from LensKit 0.14 and no longder did anything.Support auto-detecting keys in
lenskit.data.ItemListCollection.from_df()
(⛙ 659).
2025.1.1#
LensKit 2025.1.1 was released March 7, 2025.
The changes in this release are too numerous and fundamental to fully document in traditional release notes. See the following for release update documentation:
Migrating from LensKit 0.x for conceptual changes and how to upgrade your code.
The notes below for behavior changes (e.g. new defaults, new metric capabilities), and small bits not covered in the migration guide.
The full changelog in the Git history and issue/PR milestone.
LensKit 2025 has many breaking changes, with the migration guide (Migrating from LensKit 0.x) documenting the major ones. Below are some smaller ones not covered by that document:
Where Pandas data frames are still used, the standard user and item columns have been renamed to
user_id
anditem_id
respectively, withuser_num
anditem_num
for 0-based user and item numbers. This is to remove ambiguity about how users and items are being referenced.The
Popular
recommender has been removed in favor ofPopScore
.The DCG metric has been removed, as it is basically never used and was not useful as a part of the NDCG implementation.
Many LensKit components (batch running, model training, etc.) now report progress the progress API in
lenskit.logging.progress
, and can be connected to Jupyter or Rich.Added RBP top-N metric (⛙ 334).
Added command-line tool to fetch datasets (⛙ 347).
Important
Some LensKit metric default has been changed; this results in values different from those computed by previous versions, either more correct or more consistent with common practice.
The
NDCG
metric now defaults to ignore rating values.
Most models will exhibit some changes, hopefully mostly in performance, due to moving to PyTorch. There are some deliberate behavior changes in this new version, however, documented here.
ALS models only use Cholesky decomposition (previously selected with the erroneously-named
method="lu"
option); conjugate gradient and coordinate descent are no longer available. Cholesky decomposition is faster on PyTorch than it was with Numba, and is easier to maintain.The default minimum similarity for
UserUser
is now \(10^{-6}\).k-NN algorithms no longer support negative similarities;
min_sim
is clamped to be at least the smallest normal in 32-bit floating point (\(1.75 \times 10^{-38}\)).The
implicit
bridge algorithms no longer look at rating values when they are present.Bias is no longer optional for
BiasedMFScorer
andFunkSVD
; both are inherently biased models, and FunkSVD is not commonly used.lenskit.hpf.HPF
no longer uses ratings as synthetic counts by default.
Corrected documentation errors for
recall()
andhit()
(⛙ 369 by @lukas-wegmeth).
Bumped minimum supported dependencies as per SPEC0 (Python 3.11, NumPy 1.24, Pandas 2.0, SciPy 1.10).
Added support for Pandas 2 (⛙ 364) and Python 3.12.
Improved Apple testing to include vanilla Python and Apple Silicon (⛙ 366).
Updated build environment, dependency setup, taskrunning, and CI to more consistent and maintainable.
Removed legacy random code and SeedBank usage in favor of SPEC 7 (see Random Seeds).
Code is now auto-formatted with Ruff.