Novelty

Definition

Novelty

Novelty is a beyond-accuracy evaluation criterion measuring the degree to which recommended items are unknown to the user and different from what the user has seen before. A novel recommendation exposes the user to something they would not have encountered or sought out on their own. It directly trades off against repetition: pure accuracy-optimal systems re-recommend familiar/popular items, which is correct but stale, whereas novelty rewards discovery. It is one of the standard beyond-accuracy objectives alongside Diversity, Serendipity, and Coverage.

Intuition

Repetition vs. Discovery

Users like consuming what they already know (a favorite song on repeat, re-reading a known author), but a recommender that only surfaces the already-known fails its purpose: in news recommendation, repeatedly suggesting articles the user has already read offers no new information and engagement decays; in music, only ever suggesting known songs/artists blocks discovery and users get bored.

Novelty is the dimension that captures “have I seen this before?” — distinct from its neighbors:

  • Diversity = items in the list are different from each other (intra-list).
  • Novelty = items are different from the user’s own past (or globally unpopular).
  • Serendipity = items are novel and relevant and surprising (the useful subset of novelty).

So serendipity ⊂ novelty: a novel item can still be useless, but a serendipitous one must first be novel.

Mathematical Formulation

The lecture defines novelty conceptually (RS-L02 slide 36) as items being unknown/unseen by the user. The standard operational measure (Vargas & Castells, RecSys 2011), used to make it computable, is self-information / popularity-based novelty: a recommended item is novel in proportion to how unlikely it is to be already known, which is estimated from its global popularity.

where:

  • — the list of items recommended to user
  • — the probability that a randomly chosen user has interacted with item (its popularity); estimated as the fraction of users who consumed
  • — the self-information (surprisal) of item : popular items () contribute novelty, rare long-tail items () contribute large novelty
  • — number of users; — recommendation list length

An alternative user-relative (unseen) formulation scores novelty by dissimilarity to the user’s own history rather than to global popularity:

so an item far from everything the user has already consumed counts as novel. Both forms are higher-is-better (↑).

Key Properties / Variants

  • Two reference points. Global novelty (self-information, ) measures novelty w.r.t. the whole user population; personalized/unseen novelty measures it w.r.t. the individual’s history . The lecture’s verbal definition (“unknown to the user”) is the personalized notion; the popularity form is the common computable proxy.
  • Long-tail link. Novelty mechanically rewards items in the long tail of the popularity curve. Pushing novelty therefore tends to fight Popularity Bias and improve Catalogue Coverage — though novelty (per-list) and coverage (catalog-wide) are not identical.
  • Accuracy trade-off. Novelty conflicts with accuracy metrics like NDCG/Recall: the highest-probability next item is usually a familiar/popular one. RS-L02 frames this as a multi-objective trade-off; over-optimizing accuracy narrows recommendations into a Filter Bubble.
  • Relation to repetition research. In sequential / next-basket recommendation the explore-vs-repeat balance is exactly this tension; repeat-biased methods can score higher on accuracy while explore-biased methods raise novelty.
  • Novelty in generative recommendation (RS-L04). GenRec exposes a structural novelty problem at decode time:
    • Beam search homogeneity / amplification bias — items with similar semantic IDs share opening codes, so Beam Search locks onto a popular prefix and the top- list collapses into near-duplicates (low novelty), pruning long-tail prefixes early.
    • Decode-time novelty fixes — inject randomness (temperature/sampling), diverse beam search, or post-hoc MMR re-ranking.
    • Train-time novelty fixes — reward novelty/diversity inside the GRPO group (penalize look-alike candidates), or shape the tokenizer (LETTER) so popular items do not all collapse onto one prefix.
    • Evaluation caveat (RS-L04 slide 58). Offline metrics (Recall@K, NDCG@K) credit only the exact logged click, so a genuinely novel-but-good recommendation the user never saw is scored as wrong — benchmarks under-credit the very novelty GenRec is built to provide.
Compute popularity-based novelty of a recommendation list R_u
─────────────────────────────────────────────────────────────
Precompute p(i) = (#users who interacted with i) / |U|   for all items i
 
novelty(R_u):
    s ← 0
    for each item i in R_u:
        s ← s + ( -log2( p(i) ) )      # self-information; rarer ⇒ larger
    return s / |R_u|                    # mean over the list (↑ better)

Connections

Appears In