BERT for IR

BERT for IR

BERT for IR refers to the application of the Bidirectional Encoder Representations from Transformers (BERT) architecture to search tasks. BERT allows the system to understand the context of query and document terms, moving beyond exact keyword matching to semantic understanding.

Primary Architectures

Cross-Encoder (MonoBERT): Query and Document are concatenated as input: [CLS] Query [SEP] Document [SEP].

Score = $MLP (BERT (q, d))$

High accuracy, very slow (must run for every pair).

Bi-Encoder (DPR/Dense Retrieval): Query and Document are encoded separately.

Score = $Enc_{q} (q) \cdot Enc_{d} (d)$

Fast (uses ANN / Faiss), lower accuracy than cross-encoders.

Late Interaction (ColBERT): Encodes both separately but keeps multiple vectors per token.

Score = $\sum max (similarity between vector sets)$

Good balance of speed and accuracy.

Context Matters

In keyword IR, “bank” in “river bank” and “bank account” are the same. BERT “reads” the whole sentence and creates different embeddings for these two “banks.” This allows the search engine to understand the user’s intent rather than just their words.

The Paradigm

Pre-training: Learn general language patterns from massive corpora (Wikipedia, books).
Fine-tuning: Train the model on IR-specific data (like MS MARCO) to distinguish between relevant and irrelevant documents.

Connections

Foundations: Transformer architecture, Attention mechanisms.
Used in: DeepCT, DeepImpact, uniCOIL, Retrieval-Augmented Generation.
Components of: Multi-Stage Ranking.

Study Notes

Explorer

BERT for IR

BERT for IR

The Paradigm

Connections

Appears In

Graph View

Table of Contents

Backlinks