Question Answering
Definition
Question answering (QA) systems accept a natural language question and return a precise answer, optionally grounded in a source document or corpus. Extractive QA identifies the answer span within a provided passage (as in SQuAD benchmarks); generative QA produces free-form answers using language models. Open-domain QA retrieves relevant documents then extracts or generates answers. Modern QA systems typically use a retriever (BM25 or dense embeddings) to fetch relevant context and a reader (transformer) to extract or generate the answer. QA is the core technology behind RAG systems and AI chatbots that answer knowledge base queries.
Why It Matters
Question answering is the fundamental capability that makes AI chatbots useful for knowledge retrieval. When a user asks 'How do I reset my password?' or 'What is the refund policy?' a QA system can extract the precise answer from documentation rather than returning a list of potentially relevant links. This transforms the user experience from search to conversation. Accurate QA reduces support ticket volume by deflecting queries to self-service, directly reducing operational costs.
How It Works
Modern open-domain QA uses a two-stage retrieve-then-read pipeline. The retriever (typically a BM25 index or a bi-encoder dense retrieval model) selects the top-k passages from a large corpus based on query similarity. The reader (a cross-encoder transformer like BERT fine-tuned on SQuAD) processes each passage with the question and predicts the answer start/end positions within the passage. For generative QA, a seq2seq model generates the answer conditioned on retrieved passages. RAG combines both paradigms for grounded, fluent answers.
Extractive Question Answering — Answer Span Detection
Question
When was OpenAI founded?
Context Passage (answer span highlighted)
OpenAI was founded in December 2015 by Sam Altman, Greg Brockman, Elon Musk, and others. The company released GPT-4 in March 2023, which demonstrated strong performance across many benchmarks. OpenAI is headquartered in San Francisco, California.
Extracted Answer
December 2015
Real-World Example
A software documentation chatbot uses extractive QA to answer developer questions against 500+ pages of API docs. When a developer asks 'What is the rate limit for the messages endpoint?' the system retrieves the rate-limits section and extracts '100 requests per minute per API key' as the answer span. This deflects 67% of developer support tickets, freeing engineering time for complex integration questions.
Common Mistakes
- ✕Expecting QA models to answer questions not covered by retrieved documents—models hallucinate when context is insufficient
- ✕Using a single passage as context—complex questions often require synthesizing information from multiple sources
- ✕Ignoring answer confidence scores—low-confidence answers should trigger fallback to human support
Related Terms
Reading Comprehension
Reading comprehension is the NLP task of answering questions about a given passage by locating or generating the answer from within the text, serving as the core capability behind document-grounded chatbots and RAG systems.
Information Extraction
Information extraction automatically identifies and structures specific facts from unstructured text—who did what, when, and where—transforming free-form documents into queryable databases.
Natural Language Understanding (NLU)
Natural Language Understanding (NLU) is the AI capability that interprets the meaning behind human text or speech — identifying what the user wants (intent) and extracting key details (entities). NLU is the 'comprehension' layer of a chatbot, translating raw input into structured information the system can act on.
Semantic Search
Semantic search finds knowledge base articles based on the meaning of a query — not just the words used. By converting both queries and documents into vector embeddings, it identifies conceptually similar content even when users use different terminology than the articles, enabling more natural and accurate information retrieval.
Retrieval-Augmented Generation
Retrieval-Augmented Generation (RAG) is an AI architecture that enhances large language model responses by first retrieving relevant documents from an external knowledge base and then using that retrieved content as context when generating an answer.
Ready to build your AI chatbot?
Put these concepts into practice with 99helpers — no code required.
Start free trial →