Token
Definition
A token is the atomic unit of text representation in large language models. LLMs do not process text character by character or word by word—they process tokens, which are subword chunks determined by the model's tokenizer vocabulary. In English, most common words are a single token; longer or rarer words split into two or more tokens. Numbers, punctuation, and whitespace are tokenized separately and often inefficiently. Every LLM interaction involves counting tokens: the input prompt consumes tokens from the context window, the model's response generates output tokens (which cost more per token than input tokens with most providers), and the combined input + output determines the total API cost.
Why It Matters
Tokens are the currency of LLM computing. Understanding tokens helps developers predict and control three critical aspects: cost (most LLM APIs charge separately for input and output tokens), context window utilization (prompts, history, and retrieved documents all compete for the same token budget), and response quality (longer responses require more output tokens and cost more). For 99helpers product teams budgeting AI features, token awareness prevents billing surprises—a feature that generates verbose responses at scale can cost 5-10x more than a concise equivalent, making token efficiency an important design consideration.
How It Works
Token counting in practice: using OpenAI's tiktoken library, encoding 'Hello, how can I help you today?' produces 8 tokens: ['Hello', ',', ' how', ' can', ' I', ' help', ' you', ' today', '?']. Code tokenizes differently: 'function processPayment(amount)' becomes roughly 7 tokens. Some LLM APIs (OpenAI, Anthropic) return token usage in the API response, enabling exact cost tracking. Tokens flow through the model as integer IDs, each mapped to a dense embedding vector, processed through transformer layers, and ultimately decoded back to text by mapping output token IDs to string representations.
Token — Subword Boundaries
Key Facts
~4 chars
average token length (English)
100K+
tokens in typical vocabulary
1 token
≈ ¾ of a word on average
Why subwords? Common words get their own token ("the", "is"). Rare words split into recognizable pieces ("unbelievable" → "un" + "bel" + "iev" + "able"), keeping vocabulary manageable.
Real-World Example
A 99helpers support chatbot generates an average response of 150 tokens. At 1,000 conversations/day with GPT-4o pricing ($0.01 per 1K output tokens), output costs $1.50/day—modest. But analysis reveals that 20% of responses are unnecessarily verbose, averaging 400 tokens. By adding 'Be concise. Respond in 2-3 sentences when possible.' to the system prompt, average output drops to 120 tokens, saving $0.30/day ($109/year). For a product serving 100 customers, this optimization saves $10,900/year while improving user experience through shorter, more direct answers.
Common Mistakes
- ✕Using word count as a proxy for token count when estimating context window usage or costs—1,000 words is typically 1,300-1,500 tokens.
- ✕Ignoring that output tokens cost 2-4x more than input tokens in most LLM API pricing—response length is a more significant cost driver than prompt length.
- ✕Not accounting for special tokens like BOS (beginning of sequence) and EOS (end of sequence) tokens that models add automatically, slightly inflating token counts.
Related Terms
Tokenization
Tokenization converts raw text into a sequence of tokens—the basic units an LLM processes—using algorithms like byte-pair encoding that split text into subword pieces rather than whole words or individual characters.
Context Length
Context length is the maximum number of tokens an LLM can process in a single request—encompassing the system prompt, conversation history, retrieved documents, and the response—determining how much information the model can consider simultaneously.
LLM API
An LLM API is a cloud service interface that provides programmatic access to large language models, allowing developers to send prompts and receive completions without managing model infrastructure.
Max Tokens
Max tokens is an LLM API parameter that limits the maximum number of tokens the model can generate in a single response, controlling response length, cost, and latency.
Byte-Pair Encoding (BPE)
Byte-Pair Encoding (BPE) is the subword tokenization algorithm used by most LLMs to build their vocabulary by iteratively merging the most frequent adjacent byte or character pairs in training text.
Ready to build your AI chatbot?
Put these concepts into practice with 99helpers — no code required.
Start free trial →