Neural Reranking

Neural Reranking

Neural Reranking is the process of applying deep learning models (typically Transformers like BERT) to re-evaluate and re-order the top- results (e.g., ) retrieved by a first-stage model (e.g., BM25).

Why Rerank?

First-stage retrieval (e.g., BM25) is fast but relies on exact keyword matching (lexical mismatch problem). Neural models are powerful but too slow to score millions of documents. Reranking combines the best of both: fast initial filtering followed by expensive but precise semantic scoring of the most promising candidates.

Architectures: Bi-Encoders vs. Cross-Encoders

Neural rerankers typically follow one of two paradigms, balancing performance and efficiency:

1. Bi-Encoders (Dual Encoders)

  • Mechanism: Query and document are encoded independently into vectors and . Scoring is a simple dot product or cosine similarity.
  • Trade-off: Lower precision (no interaction between query and document tokens) but extremely fast.
  • Used in: First-stage Dense Retrieval.

2. Cross-Encoders

  • Mechanism: The query and document are concatenated as a single input and passed through the model.
  • Trade-off: High precision (all-to-all attention between q and d tokens) but very slow (latency scales with ).
  • Examples: MonoBERT, monoT5.

Performance vs. Latency

Cross-encoders are the state-of-the-art for ranking quality, but their computational cost forces us to keep the reranking list () small.

Connections

Appears In