Retrieval-Augmented Generation (RAG)

Hallucination

Definition

Hallucination occurs when an LLM produces output that contradicts facts, invents information not present in the source material, or confidently states things that are simply wrong. Hallucinations stem from how LLMs work: they predict statistically likely next tokens based on patterns in training data, which can produce fluent, plausible text even when no factual basis exists. Types include: factual hallucination (inventing false facts), faithfulness hallucination (contradicting the provided context), and entity hallucination (fabricating names, dates, or identifiers). In RAG systems, hallucination can occur even when relevant context is retrieved — the model may fail to use the context correctly, blend it with incorrect parametric knowledge, or contradict it outright.

Why It Matters

Hallucination is the primary reliability risk in AI applications, and it is especially dangerous in customer support contexts where incorrect information can mislead users, cause incorrect actions, or damage trust. A customer who acts on a hallucinated answer about billing, cancellation policy, or product capability will have a negative experience regardless of how helpful the AI seemed. RAG reduces but does not eliminate hallucination by grounding the model in retrieved context. Mitigating hallucination requires a combination of retrieval quality (providing correct context), prompt design (instructing the model to stay grounded), and evaluation (measuring faithfulness scores).

How It Works

Hallucination in RAG systems is detected through faithfulness evaluation: comparing the generated answer against the retrieved context to verify that every claim in the answer is supported by the context. Tools like RAGAS measure faithfulness by having another LLM identify claims in the generated answer and check each against the retrieved documents. Mitigation strategies include: explicit prompting ('Answer only from the provided context. If the answer is not in the context, say so'), temperature reduction (lower temperature makes models more conservative), fine-tuning on grounded response examples, and citation requirements (asking the model to cite specific context passages).

Hallucination Detection — Supported vs Fabricated Claims

Retrieved Context

The refund policy allows returns within 30 days of purchase.

Refunds are processed within 5-7 business days.

Digital purchases are non-refundable after download.

Contact support with your order ID to initiate a return.

Generated Response

Returns are accepted within 30 days. Launched in March 2022, our policy also covers digital purchases which are non-refundable.

Refunds take 5-7 business days. Premium members get instant refunds.

Submit your order ID to support to start the process. Digital items cannot be returned after download.

Claim Analysis

OKReturns within 30 days

XLaunched March 2022

OKDigital non-refundable

XPremium = instant refunds

OK5-7 business days

OKSubmit order ID

Supported claims

6 / 8

Hallucination rate

25%

Supported by context

Hallucinated — no source

Real-World Example

A 99helpers customer discovers through CSAT analysis that low-rated chatbot interactions cluster around billing and pricing questions. Manual review reveals the AI frequently gives subtly incorrect answers about pricing tiers — not retrieved from the knowledge base but generated from its parametric knowledge. They add an explicit instruction to the system prompt: 'For any pricing or billing question, you MUST quote directly from the provided context. If no pricing information is in the context, say you cannot confirm pricing details and offer to connect the customer with a human agent.' Hallucination rate on pricing questions drops from 18% to under 2%.

Common Mistakes

✕Treating RAG as a complete solution to hallucination — RAG significantly reduces hallucination but does not eliminate it; ongoing evaluation and monitoring are required
✕Relying only on low temperature to prevent hallucination — temperature affects confidence calibration but does not stop the model from using incorrect parametric knowledge when context is ambiguous
✕Not testing for hallucination before deployment — deploy automated faithfulness evaluation alongside human spot-checks before making an AI chatbot publicly available

Related Terms

Retrieval-Augmented Generation

Retrieval-Augmented Generation (RAG) is an AI architecture that enhances large language model responses by first retrieving relevant documents from an external knowledge base and then using that retrieved content as context when generating an answer.

Grounding

Grounding in AI refers to anchoring a language model's responses to specific, verifiable source documents or data, reducing hallucination by ensuring the model draws on retrieved evidence rather than relying on potentially incorrect parametric knowledge.

Ready to build your AI chatbot?

Put these concepts into practice with 99helpers — no code required.

Start free trial →