Large Language Models (LLMs)

Token

Definition

A token is the atomic unit of text representation in large language models. LLMs do not process text character by character or word by word—they process tokens, which are subword chunks determined by the model's tokenizer vocabulary. In English, most common words are a single token; longer or rarer words split into two or more tokens. Numbers, punctuation, and whitespace are tokenized separately and often inefficiently. Every LLM interaction involves counting tokens: the input prompt consumes tokens from the context window, the model's response generates output tokens (which cost more per token than input tokens with most providers), and the combined input + output determines the total API cost.

Why It Matters

Tokens are the currency of LLM computing. Understanding tokens helps developers predict and control three critical aspects: cost (most LLM APIs charge separately for input and output tokens), context window utilization (prompts, history, and retrieved documents all compete for the same token budget), and response quality (longer responses require more output tokens and cost more). For 99helpers product teams budgeting AI features, token awareness prevents billing surprises—a feature that generates verbose responses at scale can cost 5-10x more than a concise equivalent, making token efficiency an important design consideration.

How It Works

Token counting in practice: using OpenAI's tiktoken library, encoding 'Hello, how can I help you today?' produces 8 tokens: ['Hello', ',', ' how', ' can', ' I', ' help', ' you', ' today', '?']. Code tokenizes differently: 'function processPayment(amount)' becomes roughly 7 tokens. Some LLM APIs (OpenAI, Anthropic) return token usage in the API response, enabling exact cost tracking. Tokens flow through the model as integer IDs, each mapped to a dense embedding vector, processed through transformer layers, and ultimately decoded back to text by mapping output token IDs to string representations.

Token — Subword Boundaries

"chatbot"

→

chat

#9126

bot

#13645

"tokenization"

→

token

#3263

ization

#2941

"Hello world!"

→

Hello

#15339

world

#995

"unbelievable"

→

#929

bel

#15823

iev

#3912

able

#481

Key Facts

~4 chars

average token length (English)

100K+

tokens in typical vocabulary

1 token

≈ ¾ of a word on average

Why subwords? Common words get their own token ("the", "is"). Rare words split into recognizable pieces ("unbelievable" → "un" + "bel" + "iev" + "able"), keeping vocabulary manageable.

Real-World Example

A 99helpers support chatbot generates an average response of 150 tokens. At 1,000 conversations/day with GPT-4o pricing ($0.01 per 1K output tokens), output costs $1.50/day—modest. But analysis reveals that 20% of responses are unnecessarily verbose, averaging 400 tokens. By adding 'Be concise. Respond in 2-3 sentences when possible.' to the system prompt, average output drops to 120 tokens, saving $0.30/day ($109/year). For a product serving 100 customers, this optimization saves $10,900/year while improving user experience through shorter, more direct answers.

Common Mistakes

✕Using word count as a proxy for token count when estimating context window usage or costs—1,000 words is typically 1,300-1,500 tokens.
✕Ignoring that output tokens cost 2-4x more than input tokens in most LLM API pricing—response length is a more significant cost driver than prompt length.
✕Not accounting for special tokens like BOS (beginning of sequence) and EOS (end of sequence) tokens that models add automatically, slightly inflating token counts.

Ready to build your AI chatbot?

Put these concepts into practice with 99helpers — no code required.

Start free trial →

Token

Definition

Why It Matters

How It Works

Real-World Example

Common Mistakes

Related Terms

Tokenization

Context Length

LLM API

Max Tokens

Byte-Pair Encoding (BPE)

Ready to build your AI chatbot?