Learned Sparse Retrieval

Learned Sparse Retrieval

Learned Sparse Retrieval (LSR) uses neural models to produce sparse document/query representations that can be stored in and searched with traditional Inverted Index infrastructure, combining neural effectiveness with sparse retrieval efficiency.

Approaches

MethodTypeKey Idea
doc2query / docTTTTTqueryDocument ExpansionGenerate expansion terms with seq2seq model
DeepCTTerm reweightingPredict context-aware term importance
DeepImpactTerm reweightingPredict impact scores per term
uniCOILToken weightsSingle-vector per-token importance
SPLADEFull vocabularyLog-saturated weights over entire vocabulary

Why LSR?

Best of Both Worlds

  • Like BM25: Uses inverted index → fast retrieval, scalable
  • Like dense retrieval: Learns semantic term importance from data
  • Plus expansion: Can add semantically related terms not in the original text

Appears In