Retrieval-Augmented Generation (RAG)

Retrieval-Augmented Fine-Tuning (RAFT)

Definition

RAFT, introduced by researchers at UC Berkeley, bridges the gap between RAG (which uses general LLMs not specifically trained for retrieval-grounded generation) and domain-specific fine-tuning (which trains on domain knowledge without retrieval context). In RAFT training, each example consists of a question, 1 relevant document ('oracle'), 3-5 irrelevant distractor documents (that look plausible but don't contain the answer), and the answer with a chain-of-thought explanation citing the oracle document. Training on this mixture teaches the LLM two critical skills: (1) identifying which retrieved document contains the answer, and (2) generating an answer that explicitly references the supporting evidence.

Why It Matters

Standard fine-tuning on domain data teaches an LLM what to know but not how to use retrieved context effectively. A general instruction-tuned LLM used in RAG may ignore relevant context in favor of its parametric knowledge, or hallucinate by blending retrieved and memorized information. RAFT trains the model specifically for RAG-style inference, significantly improving its ability to identify relevant documents in a noisy retrieved set and generate faithful, grounded answers. For 99helpers customers in specialized domains (legal, medical, financial software), RAFT-fine-tuned models can dramatically outperform generic LLMs on domain-specific RAG benchmarks.

How It Works

RAFT training data construction: for each document in the knowledge base, generate 3-5 questions answerable from that document using an LLM. For each question, include the relevant document plus 3-5 randomly sampled irrelevant documents as distractors. Generate a chain-of-thought answer that says 'Based on document [X]: [reasoning] → [answer]'. Fine-tune the LLM (e.g., Llama 3 or Mistral) on this dataset using standard supervised fine-tuning. The resulting model is specifically optimized for retrieving and synthesizing from mixed-relevance context. RAFT-fine-tuned models outperform both RAG with a generic model and fine-tuned models without RAG on domain-specific QA benchmarks.

RAFT — Retrieval Augmented Fine-Tuning

Training Data Format

Question

"How do I reset my API key?"

Context Documents

Doc A (relevant), Doc B (distractor), Doc C (relevant)

Answer

"Navigate to Settings > API..."

RAFT Teaches the Model to

1

Identify relevant passages

Focus on context that answers the question

2

Ignore distractor documents

Skip documents that look related but are misleading

3

Generate grounded answers

Cite the specific context used

Standard RAG

No fine-tuning

Mixed docs handlingInconsistent
Distractor resistanceLow
Citation qualityVariable

RAFT

Fine-tuned on RAG triples

Mixed docs handlingStrong
Distractor resistanceHigh
Citation qualityConsistent

Real-World Example

A 99helpers enterprise customer in the healthcare sector deploys a support chatbot answering questions about their EHR software. A general GPT-4 RAG system achieves 72% answer accuracy on their evaluation set. Using RAFT, the team constructs 50,000 training examples from their EHR documentation and fine-tunes Llama 3 8B. The RAFT model achieves 84% accuracy—comparable to GPT-4 RAG—at 1/10th the inference cost per query, enabling sustainable deployment at scale while meeting their latency requirements.

Common Mistakes

  • Constructing RAFT training data with only relevant documents (no distractors)—the distractor documents are essential for teaching the model to ignore irrelevant context.
  • Fine-tuning on synthetic data without validating that generated questions are diverse and representative of real user queries.
  • Using RAFT-fine-tuned models outside the domain they were trained on—domain-specific fine-tuning reduces generalization to other topics.

Related Terms

Ready to build your AI chatbot?

Put these concepts into practice with 99helpers — no code required.

Start free trial →
What is Retrieval-Augmented Fine-Tuning (RAFT)? Retrieval-Augmented Fine-Tuning (RAFT) Definition & Guide | 99helpers | 99helpers.com