Chain-of-Thought Prompting
Definition
Chain-of-thought (CoT) prompting, introduced by Wei et al. (2022), guides LLMs to generate intermediate reasoning steps before producing a final answer. Instead of 'What is 15% of 340?' → model answers directly (often incorrectly), CoT prompts: 'Let's think step by step: 10% of 340 is 34. 5% is half of that, so 17. 10% + 5% = 34 + 17 = 51.' The reasoning trace serves as working memory that allows the model to break complex problems into manageable sub-steps. Zero-shot CoT is triggered by simply appending 'Let's think step by step' to the prompt. Few-shot CoT provides examples with reasoning traces. Chain-of-thought is the mechanism behind reasoning models (o1, o3), which generate extended internal reasoning before answering.
Why It Matters
Chain-of-thought dramatically improves LLM accuracy on tasks requiring multi-step reasoning—math problems, logical deduction, code debugging, and complex question answering. GPT-3 achieves 18% on grade-school math (GSM8K) without CoT; with CoT prompting, the same model reaches 51%. For 99helpers customers handling complex support queries—multi-step troubleshooting, complex configuration guidance, policy interpretation—CoT prompting helps the model systematically work through the problem rather than pattern-matching to the most likely surface answer. The 'let's think step by step' instruction is the simplest, most impactful single prompt modification for improving reasoning quality.
How It Works
CoT prompting variants: (1) zero-shot CoT—append 'Let's think step by step' before the answer; simple, often surprisingly effective; (2) few-shot CoT—provide 3-5 examples with full reasoning traces before the actual question; higher quality for complex tasks; (3) least-to-most prompting—break the problem into sub-problems, solve each, then combine; effective for compositional tasks; (4) self-consistency—run CoT multiple times with non-zero temperature, take the majority-vote final answer; improves reliability; (5) tree-of-thoughts—generate multiple reasoning paths simultaneously, evaluate them, and select the best; expensive but high quality. Reasoning models like o1 internalize CoT through RL training, generating thinking automatically without prompt modification.
Chain-of-Thought vs Direct Answer
Direct Answer (no CoT)
Incorrect — skipped intermediate reasoning
Chain-of-Thought
Correct — explicit reasoning chain
Real-World Example
A 99helpers support bot handles the query: 'My API rate limit is 1000 req/min but I'm seeing throttling errors after only 600 requests in a minute. Could this be a billing issue?' Without CoT: 'Yes, this could be a billing-related rate limit override.' (potentially wrong). With 'Let's think step by step' CoT: 'Let me think through this: 1) The user sees throttling at 600 req/min despite a 1000 req/min limit. 2) Possible causes: (a) concurrent IP limits, (b) endpoint-specific limits, (c) burst vs sustained rate limits, (d) billing tier limits. 3) The user specifically asks about billing—checking billing tier is relevant. 4) However, more likely cause is per-endpoint or burst limits. Conclusion: Throttling at 600 req/min despite 1000 req/min account limit most likely indicates per-endpoint rate limits rather than billing. Check endpoint-specific documentation.' The CoT trace catches the more likely non-billing cause.
Common Mistakes
- ✕Using CoT for all tasks—simple factual queries ('What is the capital of France?') don't benefit from step-by-step reasoning and the extra tokens waste cost and add latency.
- ✕Treating CoT reasoning as reliable fact—the reasoning trace is generation, not ground truth; the model can reason through an incorrect chain convincingly.
- ✕Forgetting that CoT increases output tokens significantly—a CoT response may be 3-5x longer than a direct answer, with corresponding cost implications.
Related Terms
Reasoning Model
A reasoning model is an LLM that explicitly 'thinks' through problems in an extended internal reasoning process before producing a final answer, trading inference speed for dramatically improved accuracy on complex tasks.
Few-Shot Learning
Few-shot learning provides an LLM with a small number of input-output examples within the prompt, demonstrating the desired task format and behavior without updating model weights.
In-Context Learning
In-context learning is the LLM phenomenon of adapting to new tasks purely from examples or instructions provided in the prompt, without updating model weights—including zero-shot, one-shot, and few-shot scenarios.
Large Language Model (LLM)
A large language model is a neural network trained on vast amounts of text that learns to predict and generate human-like text, enabling tasks like answering questions, writing, translation, and code generation.
LLM Inference
LLM inference is the process of running a trained model to generate a response for a given input, encompassing the forward pass computation, token generation, and the infrastructure required to serve predictions at scale.
Ready to build your AI chatbot?
Put these concepts into practice with 99helpers — no code required.
Start free trial →