Large Language Models (LLMs)

Model Parameters

Definition

Model parameters are the trainable weights in a neural network: the values in attention projection matrices, feed-forward network weight matrices, embedding tables, and layer normalization parameters that are updated during training to minimize the prediction loss. A '7 billion parameter' model has 7×10^9 such weights. These parameters collectively encode the model's knowledge—world facts, language patterns, reasoning heuristics, and coding skills—in a distributed, subsymbolic form. More parameters generally means more capacity to store knowledge and perform complex reasoning, which is why larger models achieve better benchmark scores. Each float16 parameter requires 2 bytes; a 7B model requires ~14GB just for weights.

Why It Matters

Parameter count is the most commonly cited model size metric and a primary determinant of quality, cost, and hardware requirements. A 7B model fits on a consumer GPU (8-16GB VRAM in int8); a 70B model requires a high-end server GPU (80GB+ VRAM in float16); a 405B model requires multiple GPUs. For 99helpers teams evaluating models, parameter count helps quickly filter options: 7B models are fast and cheap but have limited reasoning; 13B models offer a quality bump with manageable hardware; 70B models provide near-frontier quality for self-hosting; 400B+ models are typically accessed via API only. Quality doesn't scale uniformly with parameters—training data quality and fine-tuning matter equally.

How It Works

Parameter scaling in transformers: N_params ≈ 12 × d_model² × n_layers (dominant for large models, neglecting attention computation). A 7B model might have d_model=4096 and n_layers=32: 12 × 4096² × 32 ≈ 6.4B parameters. Each layer has attention weights (4 matrices of d_model × d_model), feed-forward weights (2 matrices of d_model × 4×d_model), and normalization parameters. The embedding table (vocab_size × d_model) adds ~0.5B parameters for typical vocabulary sizes. Memory for inference: model weights + activations + KV cache. Float16 weights: 2 bytes × N_params = ~14GB for 7B, ~140GB for 70B. INT4 quantization: ~0.5 bytes × N_params = ~3.5GB for 7B.

Parameter Scale Comparison

7 billion params

VRAM: ~14 GB

Cost: $

13B

13 billion params

VRAM: ~26 GB

Cost: $$

70B

70 billion params

VRAM: ~140 GB

Cost: $$$

405B+

405+ billion params

VRAM: ~800 GB+

Cost: $$$$

Capability (benchmark score)

55%

13B

68%

70B

88%

405B+

98%

Speed (tokens/sec, higher = faster)

95%

13B

80%

70B

45%

405B+

15%

More parameters = higher capability but more memory, slower inference, and higher cost

Real-World Example

A 99helpers team evaluates three Llama-3 variants: 8B, 70B, and 405B. The 8B model (16GB in float16) runs on their 2x A10G servers at 80 tokens/second, costs $0.0001/query, and achieves 71% on their support benchmark. The 70B model (140GB in float16) requires their 4x A100 server at 25 tokens/second, costs $0.0008/query, and achieves 84% on their benchmark. The 405B model is accessed via API at $0.003/query, achieves 89% quality. They deploy 8B for simple FAQ queries (70% of volume), 70B for technical queries (25%), and 405B API for complex edge cases (5%)—optimizing the quality/cost tradeoff across query tiers.

Common Mistakes

✕Using parameter count as the sole quality proxy—training data quality, fine-tuning methodology, and architecture details all significantly affect quality independently of parameter count.
✕Confusing active parameters (in MoE models) with total parameters—Mixtral-8x7B has 47B total parameters but activates only ~13B per token; comparing active parameters is more relevant for inference cost.
✕Underestimating total memory requirements—model weights alone don't determine hardware requirements; KV cache, activations, and overhead can double total GPU memory usage.

Ready to build your AI chatbot?

Put these concepts into practice with 99helpers — no code required.

Start free trial →

Model Parameters

Definition

Why It Matters

How It Works

Real-World Example

Common Mistakes

Related Terms

Large Language Model (LLM)

Scaling Laws

Model Quantization

Mixture of Experts (MoE)

Foundation Model

Ready to build your AI chatbot?