Study Notes

❯

❯

Retrieval Augmented Generation

Retrieval-Augmented Generation

Jun 06, 20262 min read

neural-ir

Retrieval-Augmented Generation

Retrieval-Augmented Generation (RAG)

RAG is a framework that combines a retrieval model (which finds relevant documents) with a generative model (like an LLM). Instead of relying solely on its internal knowledge, the generator “reads” the retrieved documents to provide more accurate, grounded, and up-to-date answers.

The RAG Pipeline

Retrieve: Given a query $q$ , find the top- $k$ relevant documents $D = {d_{1}, ..., d_{k}}$ from a knowledge base.

Augment: Create an expanded prompt: Context: [D] Question: [q].

Generate: The LLM produces an answer $a$ conditioned on the context: $P (a ∣ q, D)$ .

Open Book Exam

Traditional LLMs (like GPT-4) are like a student taking an exam from memory. They might get facts wrong (hallucinate). RAG turns the exam into an “open book” test. The system searches the library, finds the right page, and then writes an answer based on what it’s looking at.

Why Use RAG?

Factuality: Reduces “hallucinations” by forcing the model to cite sources.
Freshness: You don’t need to retrain the model to update its knowledge; you just update the documents in the retriever’s index.
Transparency: Users can see the source documents used to generate the answer.
Privacy: Allows LLMs to answer questions about private data without that data ever being part of the model’s training set.

Connections

Retriever part: Can use BM25, BERT for IR (Dense Retrieval), or uniCOIL.
Generator part: Large Language Models (LLMs).
Problem solved: Hallucination, outdated information in weights.

Appears In

IR-L09 - RAG

Graph View

Retrieval-Augmented Generation
Why Use RAG?
Connections
Appears In

Backlinks

Agentic Search
Atlas
BERT for IR
FiD
Language Model for IR
Query Expansion
SEARCH-R1
Self-RAG
IR - Overview
IR-L04 - Evaluation
IR-L09 - RAG
IR-L13 - RL for Reasoning and Search

Created with Quartz v4.5.2 © 2026

GitHub
Discord Community