Large Language Models (LLMs)

LLM API

Definition

An LLM API (Large Language Model Application Programming Interface) is a hosted service that exposes LLM capabilities through standard HTTP endpoints, typically RESTful with JSON request/response formats. Developers send a prompt (and optional parameters like temperature, max_tokens, and model selection) to the API and receive a generated response. Major providers include OpenAI (GPT-4o, GPT-4o-mini), Anthropic (Claude 3.5 Sonnet, Claude 3 Haiku), Google (Gemini 1.5 Pro, Gemini Flash), and Mistral. LLM APIs handle model serving infrastructure, scaling, and maintenance—enabling teams to build AI applications without ML ops expertise. Pricing is usage-based: per input token and per output token.

Why It Matters

LLM APIs are the primary way application developers access AI capabilities. Building and hosting your own LLM requires substantial ML infrastructure expertise and capital; APIs provide instant access to frontier model quality with per-query pricing and zero infrastructure investment. For 99helpers customers, LLM APIs are the backbone of AI chatbot features: the platform calls an LLM API for each user message, passing the retrieved knowledge base context alongside the user's question. Understanding API pricing, rate limits, and model selection helps teams optimize costs and maintain quality at scale.

How It Works

A standard LLM API call (OpenAI-compatible format): POST /v1/chat/completions with JSON body: {model: 'gpt-4o', messages: [{role: 'system', content: 'You are a support assistant.'}, {role: 'user', content: 'How do I reset my password?'}], temperature: 0.3, max_tokens: 500}. The response includes: choices[0].message.content (the generated text), usage.prompt_tokens, usage.completion_tokens (for billing), finish_reason ('stop', 'length', or 'content_filter'). Streaming responses (stream: true) return tokens as server-sent events, enabling progressive display in the UI. Most providers offer an OpenAI-compatible API, enabling drop-in provider switching.

LLM API — Request/Response Flow

Client

POST /v1/chat/completions

model: gpt-4o

messages: [...]

temperature: 0.7

max_tokens: 500

Authorization: Bearer sk-…

API

Auth

Rate limit

Route

Response

200 OK

choices[0].message

usage.prompt_tokens: 312

usage.completion_tokens: 148

finish_reason: "stop"

Rate limits

RPM500 requests/min

TPM200,000 tokens/min

429Too Many Requests → backoff

Cost model (example)

Input$2.50 / 1M tokens

Output$10.00 / 1M tokens

This call$0.00227

Streaming mode (stream: true) returns server-sent events — one chunk per token — enabling the UI to display output incrementally.

Real-World Example

A 99helpers platform integration uses the Anthropic API: const response = await anthropic.messages.create({model: 'claude-3-5-sonnet-20241022', max_tokens: 1024, system: systemPrompt, messages: [{role: 'user', content: userMessage}]}). The response is streamed to the user interface for immediate feedback while the full response completes. API usage tracking shows 2.1M input tokens and 450K output tokens per day across all customers. At Claude 3.5 Sonnet pricing ($0.003/1K input, $0.015/1K output), this costs $6.30 + $6.75 = $13.05/day. Switching 40% of queries to Claude 3 Haiku ($0.00025/$0.00125 per 1K tokens) for simple factual queries saves $8/day ($2,920/year).

Common Mistakes

✕Hard-coding a single LLM provider without an abstraction layer—switching providers later requires touching every API call in the codebase.
✕Ignoring rate limits until hitting them in production—LLM APIs have requests-per-minute and tokens-per-minute limits that require retry logic and queue management.
✕Not implementing streaming for user-facing features—users experience much better UX when tokens appear progressively rather than waiting for the full response.

Ready to build your AI chatbot?

Put these concepts into practice with 99helpers — no code required.

Start free trial →

LLM API

Definition

Why It Matters

How It Works

Real-World Example

Common Mistakes

Related Terms

Large Language Model (LLM)

LLM Inference

Token

LLM Router

Model Provider

Ready to build your AI chatbot?