Large Language Models (LLMs)

Base Model

Definition

A base model (also called a foundation model or pre-trained model) is the result of the first training phase: pre-training on large text corpora. Base models are good at one thing: predicting the next token in a sequence. Given 'The capital of France is', a base model reliably outputs 'Paris'. But given 'What is the capital of France?', it might continue with '...and is there anything else you'd like to know about European capitals?' rather than answering the question—it completes text, not questions. Base models must be instruction-tuned (and preferably RLHF-aligned) before they are usable as conversational assistants. Meta releases both base models (Llama-3-8B) and instruction-tuned variants (Llama-3-8B-Instruct).

Why It Matters

Understanding the base model/instruction-tuned model distinction prevents a common deployment mistake: using a base model for chat applications. Base models are the starting point for fine-tuning, not the endpoint. For teams building custom models, starting from a base model and applying task-specific instruction tuning provides more flexibility than starting from an already-instruction-tuned model (whose behaviors may be hard to override). For teams building applications, using instruction-tuned variants (the '-Instruct' or '-Chat' versions) is always the right choice. Base models are primarily relevant for ML teams doing custom fine-tuning.

How It Works

Base model vs instruction-tuned behavior: same model, different training. Llama-3-8B base model prompt: 'Translate to French: Hello, how are you?' → likely continues with more translation examples or prose about translation rather than answering. Llama-3-8B-Instruct prompt: same → outputs 'Bonjour, comment allez-vous?' directly. The instruction-tuned variant learned to interpret prompts as instructions through SFT on (instruction, response) pairs. When fine-tuning for a specific domain, starting from the instruction-tuned model (if the base capabilities are close to what you need) is often more efficient; starting from the base model (if you need a very different behavior style) provides a cleaner slate.

Base Model vs Instruction-Tuned Model

Pre-training

Massive text corpus

next-token prediction

Base Model

e.g. Llama-3-8B

text completer

Instruct Model

e.g. Llama-3-8B-Instruct

SFT + RLHF

Base Model

✓Text completion

✓Language understanding

✓World knowledge

✗Instruction following

✗Conversational assistant

✗Safety alignment

Instruct Model

✓Text completion

✓Language understanding

✓World knowledge

✓Instruction following

✓Conversational assistant

✓Safety alignment

Real-World Example

A 99helpers ML team builds a customer support assistant. They compare two starting points: (1) Llama-3-8B base model fine-tuned on 10,000 support conversations; (2) Llama-3-8B-Instruct fine-tuned on the same data. After the same number of training steps, the instruction-tuned starting point achieves 83% on their benchmark while the base model starting point achieves 78%. The instruction-tuned model already understands conversational structure—the fine-tuning only needs to teach domain knowledge, not the fundamental assistant behavior. They choose Instruct as the starting point, saving the compute that the base model fine-tune would need to re-learn instruction following.

Common Mistakes

✕Deploying a base model for user-facing applications—base models are text completers and will produce unpredictable, unhelpful responses for conversational use cases.
✕Assuming base models and instruction-tuned models are interchangeable for fine-tuning starting points—the right choice depends on how different your target behavior is from the instruction-tuned model's default behavior.
✕Confusing 'base' with 'bad'—base models are not lower quality than instruction-tuned models; they have a different capability profile optimized for text completion rather than instruction following.

Ready to build your AI chatbot?

Put these concepts into practice with 99helpers — no code required.

Start free trial →

Base Model

Definition

Why It Matters

How It Works

Real-World Example

Common Mistakes

Related Terms

Pre-Training

Instruction Tuning

Fine-Tuning

Open-Source LLM

Large Language Model (LLM)

Ready to build your AI chatbot?