Retrieval-Augmented Generation (RAG)

Retrieval-Augmented Fine-Tuning (RAFT)

Definition

RAFT, introduced by researchers at UC Berkeley, bridges the gap between RAG (which uses general LLMs not specifically trained for retrieval-grounded generation) and domain-specific fine-tuning (which trains on domain knowledge without retrieval context). In RAFT training, each example consists of a question, 1 relevant document ('oracle'), 3-5 irrelevant distractor documents (that look plausible but don't contain the answer), and the answer with a chain-of-thought explanation citing the oracle document. Training on this mixture teaches the LLM two critical skills: (1) identifying which retrieved document contains the answer, and (2) generating an answer that explicitly references the supporting evidence.

Why It Matters

Standard fine-tuning on domain data teaches an LLM what to know but not how to use retrieved context effectively. A general instruction-tuned LLM used in RAG may ignore relevant context in favor of its parametric knowledge, or hallucinate by blending retrieved and memorized information. RAFT trains the model specifically for RAG-style inference, significantly improving its ability to identify relevant documents in a noisy retrieved set and generate faithful, grounded answers. For 99helpers customers in specialized domains (legal, medical, financial software), RAFT-fine-tuned models can dramatically outperform generic LLMs on domain-specific RAG benchmarks.

How It Works

RAFT training data construction: for each document in the knowledge base, generate 3-5 questions answerable from that document using an LLM. For each question, include the relevant document plus 3-5 randomly sampled irrelevant documents as distractors. Generate a chain-of-thought answer that says 'Based on document [X]: [reasoning] → [answer]'. Fine-tune the LLM (e.g., Llama 3 or Mistral) on this dataset using standard supervised fine-tuning. The resulting model is specifically optimized for retrieving and synthesizing from mixed-relevance context. RAFT-fine-tuned models outperform both RAG with a generic model and fine-tuned models without RAG on domain-specific QA benchmarks.

RAFT — Retrieval Augmented Fine-Tuning

Training Data Format

Question

"How do I reset my API key?"

Context Documents

Doc A (relevant), Doc B (distractor), Doc C (relevant)

Answer

"Navigate to Settings > API..."

RAFT Teaches the Model to

Identify relevant passages

Focus on context that answers the question

Ignore distractor documents

Skip documents that look related but are misleading

Generate grounded answers

Cite the specific context used

Standard RAG

No fine-tuning

Mixed docs handlingInconsistent

Distractor resistanceLow

Citation qualityVariable

RAFT

Fine-tuned on RAG triples

Mixed docs handlingStrong

Distractor resistanceHigh

Citation qualityConsistent

Real-World Example

A 99helpers enterprise customer in the healthcare sector deploys a support chatbot answering questions about their EHR software. A general GPT-4 RAG system achieves 72% answer accuracy on their evaluation set. Using RAFT, the team constructs 50,000 training examples from their EHR documentation and fine-tunes Llama 3 8B. The RAFT model achieves 84% accuracy—comparable to GPT-4 RAG—at 1/10th the inference cost per query, enabling sustainable deployment at scale while meeting their latency requirements.

Common Mistakes

✕Constructing RAFT training data with only relevant documents (no distractors)—the distractor documents are essential for teaching the model to ignore irrelevant context.
✕Fine-tuning on synthetic data without validating that generated questions are diverse and representative of real user queries.
✕Using RAFT-fine-tuned models outside the domain they were trained on—domain-specific fine-tuning reduces generalization to other topics.

Ready to build your AI chatbot?

Put these concepts into practice with 99helpers — no code required.

Start free trial →

Retrieval-Augmented Fine-Tuning (RAFT)

Definition

Why It Matters

How It Works

Real-World Example

Common Mistakes

Related Terms

Retrieval-Augmented Generation

Faithfulness

Hallucination

RAG Evaluation

Embedding Model

Ready to build your AI chatbot?