Large Language Models (LLMs)

Chain-of-Thought Prompting

Definition

Chain-of-thought (CoT) prompting, introduced by Wei et al. (2022), guides LLMs to generate intermediate reasoning steps before producing a final answer. Instead of 'What is 15% of 340?' → model answers directly (often incorrectly), CoT prompts: 'Let's think step by step: 10% of 340 is 34. 5% is half of that, so 17. 10% + 5% = 34 + 17 = 51.' The reasoning trace serves as working memory that allows the model to break complex problems into manageable sub-steps. Zero-shot CoT is triggered by simply appending 'Let's think step by step' to the prompt. Few-shot CoT provides examples with reasoning traces. Chain-of-thought is the mechanism behind reasoning models (o1, o3), which generate extended internal reasoning before answering.

Why It Matters

Chain-of-thought dramatically improves LLM accuracy on tasks requiring multi-step reasoning—math problems, logical deduction, code debugging, and complex question answering. GPT-3 achieves 18% on grade-school math (GSM8K) without CoT; with CoT prompting, the same model reaches 51%. For 99helpers customers handling complex support queries—multi-step troubleshooting, complex configuration guidance, policy interpretation—CoT prompting helps the model systematically work through the problem rather than pattern-matching to the most likely surface answer. The 'let's think step by step' instruction is the simplest, most impactful single prompt modification for improving reasoning quality.

How It Works

CoT prompting variants: (1) zero-shot CoT—append 'Let's think step by step' before the answer; simple, often surprisingly effective; (2) few-shot CoT—provide 3-5 examples with full reasoning traces before the actual question; higher quality for complex tasks; (3) least-to-most prompting—break the problem into sub-problems, solve each, then combine; effective for compositional tasks; (4) self-consistency—run CoT multiple times with non-zero temperature, take the majority-vote final answer; improves reliability; (5) tree-of-thoughts—generate multiple reasoning paths simultaneously, evaluate them, and select the best; expensive but high quality. Reasoning models like o1 internalize CoT through RL training, generating thinking automatically without prompt modification.

Chain-of-Thought vs Direct Answer

Problem: A class has 24 students. 1/3 play soccer. Half the soccer players also play basketball. How many play both?

Direct Answer (no CoT)

Prompt: Answer in one line.

12 students.

Incorrect — skipped intermediate reasoning

Chain-of-Thought

Step 1There are 24 students in the class.

Step 21/3 of them play soccer → 24 ÷ 3 = 8 students play soccer.

Step 3Half of the soccer players also play basketball → 8 ÷ 2 = 4 students.

Answer4 students play both soccer and basketball.

Correct — explicit reasoning chain

CoT trigger phrase: "Let's think step by step."Appended to the prompt.

Real-World Example

A 99helpers support bot handles the query: 'My API rate limit is 1000 req/min but I'm seeing throttling errors after only 600 requests in a minute. Could this be a billing issue?' Without CoT: 'Yes, this could be a billing-related rate limit override.' (potentially wrong). With 'Let's think step by step' CoT: 'Let me think through this: 1) The user sees throttling at 600 req/min despite a 1000 req/min limit. 2) Possible causes: (a) concurrent IP limits, (b) endpoint-specific limits, (c) burst vs sustained rate limits, (d) billing tier limits. 3) The user specifically asks about billing—checking billing tier is relevant. 4) However, more likely cause is per-endpoint or burst limits. Conclusion: Throttling at 600 req/min despite 1000 req/min account limit most likely indicates per-endpoint rate limits rather than billing. Check endpoint-specific documentation.' The CoT trace catches the more likely non-billing cause.

Common Mistakes

✕Using CoT for all tasks—simple factual queries ('What is the capital of France?') don't benefit from step-by-step reasoning and the extra tokens waste cost and add latency.
✕Treating CoT reasoning as reliable fact—the reasoning trace is generation, not ground truth; the model can reason through an incorrect chain convincingly.
✕Forgetting that CoT increases output tokens significantly—a CoT response may be 3-5x longer than a direct answer, with corresponding cost implications.

Ready to build your AI chatbot?

Put these concepts into practice with 99helpers — no code required.

Start free trial →

Chain-of-Thought Prompting

Definition

Why It Matters

How It Works

Real-World Example

Common Mistakes

Related Terms

Reasoning Model

Few-Shot Learning

In-Context Learning

Large Language Model (LLM)

LLM Inference

Ready to build your AI chatbot?