Rocchio Algorithm
Rocchio Algorithm
The Rocchio Algorithm is a classic technique for implementing Relevance Feedback. It updates an initial query vector by moving it toward the centroid of known relevant documents and away from the centroid of known non-relevant documents.
Rocchio Query Update
where:
- — the modified query vector
- — the original query vector
- — set of known relevant documents
- — set of known non-relevant documents
- — weights (hyperparameters) controlling the balance between original intent, positive feedback, and negative feedback.
Vector Space Navigation
Imagine the Vector Space Model where points represent documents. A user’s query is also a point. If the user marks some results as “good,” Rocchio literally “nudges” the query point closer to those good results. Usually, we weigh relevant documents more heavily than non-relevant ones ().
Pseudo-Relevance Feedback (PRF)
Since users rarely provide explicit feedback, we often use Pseudo-Relevance Feedback:
- Run initial search.
- Assume the top documents are relevant (no ).
- Apply Rocchio to expand the query.
- Run second search with .
Connections
- Context: Used within the Vector Space Model (VSM).
- Related concepts: BM25 (which typically doesn’t use vectorcentroids), Query Expansion.
- Modern link: Modern PRF methods often use Transformers to expand the query instead of vector arithmetic.