Smoothing

Smoothing

Smoothing is a technique used in language models for information retrieval to adjust probability estimates. Its primary goal is to prevent zero-probability estimates for terms that do not appear in a specific document but are present in the general collection, while also accounting for the document’s content.

Why Smooth?

Without smoothing, if a document is missing even one term from a multi-word query, the language model would assign it a probability of 0 ( $P (Q ∣ D) = 0$ ). Smoothing “steals” a small amount of probability mass from seen terms and redistributes it to unseen terms using the background collection model.

Common Smoothing Methods

1. Jelinek-Mercer (Linear Interpolation)

Mixes the document model with the collection model using a fixed weight $λ$ . $P_{λ} (t ∣ M_{d}) = (1 - λ) P_{m l e} (t ∣ M_{d}) + λ P_{m l e} (t ∣ M_{c})$

$λ$ : Higher values (e.g., 0.7) favor the collection (better for long queries), lower values (e.g., 0.1) favor the document (better for short queries).

2. Dirichlet Prior

Uses a pseudo-count $μ$ from the collection. $P_{μ} (t ∣ M_{d}) = \frac{f ( t , d ) + μ P ( t ∣ M _{c} )}{∣ d ∣ + μ}$

Intuition: As document length $∣ d ∣$ increases, the influence of the prior $μ$ diminishes. It provides stronger length normalization than Jelinek-Mercer.

3. Absolute Discounting

Subtracts a constant $d$ from seen term counts. $P_{δ} (t ∣ M_{d}) = \frac{m a x ( f ( t , d ) - δ , 0 )}{∣ d ∣} + \frac{δ \cdot u}{∣ d ∣} P (t ∣ M_{c})$ where $u$ is the number of unique terms in the document.

Connections

Foundation: Query Likelihood Model
Part of: Language Models for IR
Relates to: TF-IDF (smoothing behaves similarly to IDF by downweighting common terms)

Appears In

IR-L03 - Retrieval Models

Study Notes

Explorer

Smoothing

Smoothing

Common Smoothing Methods

1. Jelinek-Mercer (Linear Interpolation)

2. Dirichlet Prior

3. Absolute Discounting

Connections

Appears In

Graph View

Table of Contents

Backlinks