Rocchio Algorithm

Rocchio Algorithm

The Rocchio Algorithm is a classic technique for implementing Relevance Feedback. It updates an initial query vector by moving it toward the centroid of known relevant documents and away from the centroid of known non-relevant documents.

Rocchio Query Update

$q_{m} = α q_{0} + β \frac{1}{∣ D _{r} ∣} \sum_{d \in D_{r}} d - γ \frac{1}{∣ D _{n r} ∣} \sum_{d \in D_{n r}} d$

where:

$q_{m}$ — the modified query vector

$q_{0}$ — the original query vector

$D_{r}$ — set of known relevant documents

$D_{n r}$ — set of known non-relevant documents

$α, β, γ$ — weights (hyperparameters) controlling the balance between original intent, positive feedback, and negative feedback.

Vector Space Navigation

Imagine the Vector Space Model where points represent documents. A user’s query is also a point. If the user marks some results as “good,” Rocchio literally “nudges” the query point closer to those good results. Usually, we weigh relevant documents more heavily than non-relevant ones ( $β > γ$ ).

Pseudo-Relevance Feedback (PRF)

Since users rarely provide explicit feedback, we often use Pseudo-Relevance Feedback:

Run initial search.
Assume the top $K$ documents are relevant (no $D_{n r}$ ).
Apply Rocchio to expand the query.
Run second search with $q_{m}$ .

Connections

Context: Used within the Vector Space Model (VSM).
Related concepts: BM25 (which typically doesn’t use vectorcentroids), Query Expansion.
Modern link: Modern PRF methods often use Transformers to expand the query instead of vector arithmetic.

Appears In

IR-L03 - Retrieval Models

Study Notes

Explorer

Rocchio Algorithm

Rocchio Algorithm

Pseudo-Relevance Feedback (PRF)

Connections

Appears In

Graph View

Table of Contents

Backlinks