Large Language Models (LLMs)

Model Card

Definition

Model cards, introduced by Mitchell et al. (2019) at Google, are documentation standards for machine learning models. A complete model card includes: model description (architecture, parameters, training compute), training data (sources, size, date range, notable characteristics), intended use cases and out-of-scope uses, performance metrics on relevant benchmarks, evaluation data and methodology, known biases and fairness considerations, ethical considerations, and caveats or limitations. Major model providers publish model cards alongside their releases—Meta's Llama-3, Mistral's models, and Hugging Face model repositories all include model cards. OpenAI and Anthropic publish system cards and model spec documents serving similar purposes.

Why It Matters

Model cards are essential for responsible AI deployment. They answer critical questions before deploying a model: Does this model's training data include my domain? What tasks is it explicitly optimized for? Has it been evaluated for bias on relevant demographic dimensions? What are its known failure modes? For 99helpers customers integrating LLM APIs, reviewing model cards and system cards helps set appropriate expectations about model behavior, identify potential risks for their specific use case (e.g., potential biases in customer-facing applications), and satisfy enterprise due diligence requirements.

How It Works

A typical model card structure follows a standardized template with sections: Model Details (name, version, type, training date, developers, license), Intended Use (primary intended uses, out-of-scope uses), Factors (relevant factors like language, domain, demographic groups evaluated), Metrics (performance metrics and their measurement approach), Evaluation Data (datasets used for evaluation, data characteristics), Training Data (description of training data), Quantitative Analyses (disaggregated performance metrics), Ethical Considerations (potential harms and mitigation), Caveats and Recommendations. Hugging Face model hubs include model card display as a first-class feature, displayed prominently on every model's page.

Model Card — Standardized Documentation for an LLM

ExampleLM-7B-Instruct

Version 2.1 · Released 2025-08-01 · Apache 2.0

Developer

Example AI Lab

Model info

ArchitectureTransformer decoder

Parameters7.24B

Context length128K tokens

Training tokens2T tokens

LanguagesEnglish, 30+ others

Intended use

✓Customer support chatbots

✓Document summarization

✓Code assistance

✓Question answering

Out of scope

✗Medical diagnosis

✗Legal advice

Evaluation results

MMLU

86.4%

Knowledge

HumanEval

79.3%

Coding

TruthfulQA

71.2%

Honesty

MT-Bench

8.7 / 10

Chat quality

Limitations & biases

May hallucinate facts with confidence. Knowledge cutoff: Aug 2025. Performance degrades on low-resource languages. Not suitable for high-stakes decisions without human review.

Real-World Example

A 99helpers enterprise customer in the healthcare sector evaluates LLMs for a medical documentation assistant. Reviewing the model cards for their candidate models, they find: Model A's training data includes medical literature (MIMIC, PubMed)—good domain fit. Model B was trained primarily on web text with limited medical content—poor domain fit. Model A's card lists known limitations: 'May confuse similar drug names with low frequency in training data'—a significant risk for prescriptions. This model card information allows them to make an informed decision to use Model A with additional validation steps for drug name accuracy.

Common Mistakes

✕Skipping model card review before production deployment—model cards contain known limitations and biases that can manifest as production failures.
✕Treating model cards as guarantees—model cards describe known issues and evaluations at training time; new failure modes can emerge in production.
✕Assuming model cards are comprehensive—coverage varies widely; some model cards are sparse, requiring additional due diligence through independent evaluation.

Ready to build your AI chatbot?

Put these concepts into practice with 99helpers — no code required.

Start free trial →

Model Card

Definition

Why It Matters

How It Works

Real-World Example

Common Mistakes

Related Terms

LLM Benchmark

Model Evaluation

Model Alignment

Safety Training

Foundation Model

Ready to build your AI chatbot?