lenskit.metrics.topn#

Top-N evaluation metrics.

Functions

bulk_impl(metric)

Decorator to register a bulk implementation for a metric.

dcg(recs, truth[, discount])

Compute the unnormalized discounted cumulative gain [JarvelinKekalainen02].

hit(recs, truth[, k])

Compute whether or not a list is a hit; any list with at least one relevant item in the first \(k\) positions (\(L_{\le k} \cap I_u^{\mathrm{test}} \ne \emptyset\)) is scored as 1, and lists with no relevant items as 0.

ndcg(recs, truth[, discount, k])

Compute the normalized discounted cumulative gain [JarvelinKekalainen02].

precision(recs, truth[, k])

Compute recommendation precision.

rbp(recs, truth[, k, patience, normalize])

Evaluate recommendations with rank-biased precision [MZ08] with a patience parameter \(\gamma\).

recall(recs, truth[, k])

Compute recommendation recall.

recip_rank(recs, truth[, k])

Compute the reciprocal rank [KV97] of the first relevant item in a list of recommendations.

lenskit.metrics.topn.bulk_impl(metric)#

Decorator to register a bulk implementation for a metric.

lenskit.metrics.topn.precision(recs, truth, k=None)#

Compute recommendation precision. This is computed as:

\[\frac{|L \cap I_u^{\mathrm{test}}|}{|L|}\]

In the uncommon case that k is specified and len(recs) < k, this metric uses len(recs) as the denominator.

This metric has a bulk implementation.

Parameters:
  • recs (DataFrame) – The recommendation list. This is expected to have a column item with the recommended item IDs; all other columns are ignored.

  • truth (DataFrame) – The user’s test data. It is expected to be indexed by item ID. If it has a rating column, that is used as the item gains; otherwise, each item has gain 1. All other columns are ignored.

  • k (int | None) – The maximum list length to consider.

Return type:

float | None

lenskit.metrics.topn.recall(recs, truth, k=None)#

Compute recommendation recall. This is computed as:

\[\frac{|L \cap I_u^{\mathrm{test}}|}{\operatorname{min}\{|I_u^{\mathrm{test}}|, k\}}\]

This metric has a bulk implementation.

Parameters:
  • recs (DataFrame) – The recommendation list. This is expected to have a column item with the recommended item IDs; all other columns are ignored.

  • truth (DataFrame) – The user’s test data. It is expected to be indexed by item ID. If it has a rating column, that is used as the item gains; otherwise, each item has gain 1. All other columns are ignored.

  • k (int | None) – The maximum list length to consider.

Return type:

float | None

lenskit.metrics.topn.hit(recs, truth, k=None)#

Compute whether or not a list is a hit; any list with at least one relevant item in the first \(k\) positions (\(L_{\le k} \cap I_u^{\mathrm{test}} \ne \emptyset\)) is scored as 1, and lists with no relevant items as 0. When averaged over the recommendation lists, this computes the hit rate [DK04].

This metric has a bulk implementation.

Parameters:
  • recs (DataFrame) – The recommendation list. This is expected to have a column item with the recommended item IDs; all other columns are ignored.

  • truth (DataFrame) – The user’s test data. It is expected to be indexed by item ID. If it has a rating column, that is used as the item gains; otherwise, each item has gain 1. All other columns are ignored.

  • k (int | None) – The maximum list length to consider.

Return type:

float | None

lenskit.metrics.topn.recip_rank(recs, truth, k=None)#

Compute the reciprocal rank [KV97] of the first relevant item in a list of recommendations. Let \(\kappa\) denote the 1-based rank of the first relevant item in \(L\), with \(\kappa=\infty\) if none of the first \(k\) items in \(L\) are relevant; then the reciprocal rank is \(1 / \kappa\). If no elements are relevant, the reciprocal rank is therefore 0. Deshpande and Karypis [DK04] call this the “reciprocal hit rate”.

This metric has a bulk equivalent.

Parameters:
  • recs (DataFrame) – The recommendation list. This is expected to have a column item with the recommended item IDs; all other columns are ignored.

  • truth (DataFrame) – The user’s test data. It is expected to be indexed by item ID. If it has a rating column, that is used as the item gains; otherwise, each item has gain 1. All other columns are ignored.

  • k (int | None) – The maximum list length to consider.

Return type:

float | None

lenskit.metrics.topn.dcg(recs, truth, discount=<ufunc 'log2'>)#

Compute the unnormalized discounted cumulative gain [JarvelinKekalainen02].

Discounted cumultative gain is computed as:

\[\begin{align*} \mathrm{DCG}(L,u) & = \sum_{i=1}^{|L|} \frac{r_{ui}}{d(i)} \end{align*}\]

Unrated items are assumed to have a utility of 0; if no rating values are provided in the truth frame, item ratings are assumed to be 1.

Parameters:
  • recs (DataFrame) – The recommendation list. This is expected to have a column item with the recommended item IDs; all other columns are ignored.

  • truth (DataFrame) – The user’s test data. It is expected to be indexed by item ID. If it has a rating column, that is used as the item gains; otherwise, each item has gain 1. All other columns are ignored.

  • discount – The rank discount function. Each item’s score will be divided the discount of its rank, if the discount is greater than 1.

Return type:

float | None

lenskit.metrics.topn.ndcg(recs, truth, discount=<ufunc 'log2'>, k=None)#

Compute the normalized discounted cumulative gain [JarvelinKekalainen02].

Discounted cumultative gain is computed as:

\[\begin{align*} \mathrm{DCG}(L,u) & = \sum_{i=1}^{|L|} \frac{r_{ui}}{d(i)} \end{align*}\]

Unrated items are assumed to have a utility of 0; if no rating values are provided in the truth frame, item ratings are assumed to be 1.

This is then normalized as follows:

\[\begin{align*} \mathrm{nDCG}(L, u) & = \frac{\mathrm{DCG}(L,u)}{\mathrm{DCG}(L_{\mathrm{ideal}}, u)} \end{align*}\]

This metric has a bulk implementation.

Parameters:
  • recs (DataFrame) – The recommendation list. This is expected to have a column item with the recommended item IDs; all other columns are ignored.

  • truth (DataFrame) – The user’s test data. It is expected to be indexed by item ID. If it has a rating column, that is used as the item gains; otherwise, each item has gain 1. All other columns are ignored.

  • discount (Callable) – The rank discount function. Each item’s score will be divided the discount of its rank, if the discount is greater than 1.

  • k (int | None) – The maximum list length.

Return type:

float | None

lenskit.metrics.topn.rbp(recs, truth, k=None, patience=0.5, normalize=False)#

Evaluate recommendations with rank-biased precision [MZ08] with a patience parameter \(\gamma\).

If \(r_{ui} \in \{0, 1\}\) is binary implicit ratings, this is computed by:

\[\begin{align*} \operatorname{RBP}_\gamma(L, u) & =(1 - \gamma) \sum_i r_{ui} p^i \end{align*}\]

The original RBP metric depends on the idea that the rank-biased sum of binary relevance scores in an infinitely-long, perfectly-precise list has is \(1/(1 - \gamma)\). However, in recommender evaluation, we usually have a small test set, so the maximum achievable RBP is significantly less, and is a function of the number of test items. With normalize=True, the RBP metric will be normalized by the maximum achievable with the provided test data.

Parameters:
  • recs (DataFrame) – The recommendation list. This is expected to have a column item with the recommended item IDs; all other columns are ignored.

  • truth (DataFrame) – The user’s test data. It is expected to be indexed by item ID. If it has a rating column, that is used as the item gains; otherwise, each item has gain 1. All other columns are ignored.

  • k (int | None) – The maximum recommendation list length.

  • patience (float) – The patience parameter \(\gamma\), the probability that the user continues browsing at each point.

  • normalize (bool) – Whether to normalize the RBP scores; if True, divides the RBP score by the maximum achievable with the test data (as in nDCG).

Return type:

float | None