lenskit.metrics.topn#

Top-N evaluation metrics.

Functions

`bulk_impl`(metric)	Decorator to register a bulk implementation for a metric.
`dcg`(recs, truth[, discount])	Compute the unnormalized discounted cumulative gain [JarvelinKekalainen02].
`hit`(recs, truth[, k])	Compute whether or not a list is a hit; any list with at least one relevant item in the first \(k\) positions (\(L_{\le k} \cap I_u^{\mathrm{test}} \ne \emptyset\)) is scored as 1, and lists with no relevant items as 0.
`ndcg`(recs, truth[, discount, k])	Compute the normalized discounted cumulative gain [JarvelinKekalainen02].
`precision`(recs, truth[, k])	Compute recommendation precision.
`rbp`(recs, truth[, k, patience, normalize])	Evaluate recommendations with rank-biased precision [MZ08] with a patience parameter \(\gamma\).
`recall`(recs, truth[, k])	Compute recommendation recall.
`recip_rank`(recs, truth[, k])	Compute the reciprocal rank [KV97] of the first relevant item in a list of recommendations.

lenskit.metrics.topn.bulk_impl(metric)#: Decorator to register a bulk implementation for a metric.

lenskit.metrics.topn.precision(recs, truth, k=None)#

Compute recommendation precision. This is computed as:

\[\frac{|L \cap I_u^{\mathrm{test}}|}{|L|}\]

In the uncommon case that k is specified and len(recs) < k, this metric uses len(recs) as the denominator.

This metric has a bulk implementation.

Parameters:

recs (DataFrame) – The recommendation list. This is expected to have a column item with the recommended item IDs; all other columns are ignored.
truth (DataFrame) – The user’s test data. It is expected to be indexed by item ID. If it has a rating column, that is used as the item gains; otherwise, each item has gain 1. All other columns are ignored.
k (int | None) – The maximum list length to consider.

Return type:

float | None

lenskit.metrics.topn.recall(recs, truth, k=None)#

Compute recommendation recall. This is computed as:

\[\frac{|L \cap I_u^{\mathrm{test}}|}{\operatorname{min}\{|I_u^{\mathrm{test}}|, k\}}\]

This metric has a bulk implementation.

Parameters:

recs (DataFrame) – The recommendation list. This is expected to have a column item with the recommended item IDs; all other columns are ignored.
truth (DataFrame) – The user’s test data. It is expected to be indexed by item ID. If it has a rating column, that is used as the item gains; otherwise, each item has gain 1. All other columns are ignored.
k (int | None) – The maximum list length to consider.

Return type:

float | None

lenskit.metrics.topn.hit(recs, truth, k=None)#

Compute whether or not a list is a hit; any list with at least one relevant item in the first \(k\) positions (\(L_{\le k} \cap I_u^{\mathrm{test}} \ne \emptyset\)) is scored as 1, and lists with no relevant items as 0. When averaged over the recommendation lists, this computes the hit rate [DK04].

This metric has a bulk implementation.

Parameters:

recs (DataFrame) – The recommendation list. This is expected to have a column item with the recommended item IDs; all other columns are ignored.
truth (DataFrame) – The user’s test data. It is expected to be indexed by item ID. If it has a rating column, that is used as the item gains; otherwise, each item has gain 1. All other columns are ignored.
k (int | None) – The maximum list length to consider.

Return type:

float | None

lenskit.metrics.topn.recip_rank(recs, truth, k=None)#

Compute the reciprocal rank [KV97] of the first relevant item in a list of recommendations. Let \(\kappa\) denote the 1-based rank of the first relevant item in \(L\), with \(\kappa=\infty\) if none of the first \(k\) items in \(L\) are relevant; then the reciprocal rank is \(1 / \kappa\). If no elements are relevant, the reciprocal rank is therefore 0. Deshpande and Karypis [DK04] call this the “reciprocal hit rate”.

This metric has a bulk equivalent.

Parameters:

recs (DataFrame) – The recommendation list. This is expected to have a column item with the recommended item IDs; all other columns are ignored.
truth (DataFrame) – The user’s test data. It is expected to be indexed by item ID. If it has a rating column, that is used as the item gains; otherwise, each item has gain 1. All other columns are ignored.
k (int | None) – The maximum list length to consider.

Return type:

float | None

lenskit.metrics.topn.dcg(recs, truth, discount=<ufunc 'log2'>)#

Compute the unnormalized discounted cumulative gain [JarvelinKekalainen02].

Discounted cumultative gain is computed as:

\[\begin{align*} \mathrm{DCG}(L,u) & = \sum_{i=1}^{|L|} \frac{r_{ui}}{d(i)} \end{align*}\]

Unrated items are assumed to have a utility of 0; if no rating values are provided in the truth frame, item ratings are assumed to be 1.

Parameters:

recs (DataFrame) – The recommendation list. This is expected to have a column item with the recommended item IDs; all other columns are ignored.
truth (DataFrame) – The user’s test data. It is expected to be indexed by item ID. If it has a rating column, that is used as the item gains; otherwise, each item has gain 1. All other columns are ignored.
discount – The rank discount function. Each item’s score will be divided the discount of its rank, if the discount is greater than 1.

Return type:

float | None

lenskit.metrics.topn.ndcg(recs, truth, discount=<ufunc 'log2'>, k=None)#

Compute the normalized discounted cumulative gain [JarvelinKekalainen02].

Discounted cumultative gain is computed as:

\[\begin{align*} \mathrm{DCG}(L,u) & = \sum_{i=1}^{|L|} \frac{r_{ui}}{d(i)} \end{align*}\]

Unrated items are assumed to have a utility of 0; if no rating values are provided in the truth frame, item ratings are assumed to be 1.

This is then normalized as follows:

\[\begin{align*} \mathrm{nDCG}(L, u) & = \frac{\mathrm{DCG}(L,u)}{\mathrm{DCG}(L_{\mathrm{ideal}}, u)} \end{align*}\]

This metric has a bulk implementation.

Parameters:

recs (DataFrame) – The recommendation list. This is expected to have a column item with the recommended item IDs; all other columns are ignored.
truth (DataFrame) – The user’s test data. It is expected to be indexed by item ID. If it has a rating column, that is used as the item gains; otherwise, each item has gain 1. All other columns are ignored.
discount (Callable) – The rank discount function. Each item’s score will be divided the discount of its rank, if the discount is greater than 1.
k (int | None) – The maximum list length.

Return type:

float | None

lenskit.metrics.topn.rbp(recs, truth, k=None, patience=0.5, normalize=False)#

Evaluate recommendations with rank-biased precision [MZ08] with a patience parameter \(\gamma\).

If \(r_{ui} \in \{0, 1\}\) is binary implicit ratings, this is computed by:

\[\begin{align*} \operatorname{RBP}_\gamma(L, u) & =(1 - \gamma) \sum_i r_{ui} p^i \end{align*}\]

The original RBP metric depends on the idea that the rank-biased sum of binary relevance scores in an infinitely-long, perfectly-precise list has is \(1/(1 - \gamma)\). However, in recommender evaluation, we usually have a small test set, so the maximum achievable RBP is significantly less, and is a function of the number of test items. With normalize=True, the RBP metric will be normalized by the maximum achievable with the provided test data.

Parameters:

recs (DataFrame) – The recommendation list. This is expected to have a column item with the recommended item IDs; all other columns are ignored.
truth (DataFrame) – The user’s test data. It is expected to be indexed by item ID. If it has a rating column, that is used as the item gains; otherwise, each item has gain 1. All other columns are ignored.
k (int | None) – The maximum recommendation list length.
patience (float) – The patience parameter \(\gamma\), the probability that the user continues browsing at each point.
normalize (bool) – Whether to normalize the RBP scores; if True, divides the RBP score by the maximum achievable with the test data (as in nDCG).

Return type:

float | None