Api.Airforce
API REFERENCE

Chat completions

Generate chat responses across 100+ models from one API. Drop-in compatible with OpenAI Chat Completions, Anthropic Messages, and Anthropic Responses.

Authentication

Every request needs a Bearer token (your Airforce API key). The Anthropic x-api-key header is also accepted on /v1/messages for SDK compatibility.

Authorization: Bearer sk-air-YOUR_API_KEY
# alt for /v1/messages:
x-api-key: sk-air-YOUR_API_KEY

POST /v1/chat/completions

OpenAI-compatible Chat Completions. Works with the official openai SDK by overriding base_url to https://api.airforce/v1.

POSThttps://api.airforce/v1/chat/completions

Request body

ParameterTypeRequiredDescription
modelstringRequiredModel ID. Use GET /v1/models to discover available IDs.
messagesarrayRequiredConversation history. Each entry has { role: "system" | "user" | "assistant" | "tool", content }. Content is a string or an array of content blocks (vision, see below).
max_tokensintegerOptionalMaximum number of tokens to generate. Capped at the model's max_output_tokens.
temperaturefloatOptionalSampling temperature, 0–2. Lower is more deterministic. Default depends on the upstream provider.
top_pfloatOptionalNucleus sampling. Use either temperature or top_p, not both.
streambooleanOptionalWhen true, response is a stream of Server-Sent Events. See "Streaming" below.
stream_optionsobjectOptional{ include_usage: boolean }. When include_usage is true the final SSE chunk carries the usage block.
stopstring | arrayOptionalUp to 4 stop sequences. Generation halts as soon as one is produced.
toolsarrayOptionalFunction definitions the model may call. See "Tool calling" below.
tool_choicestring | objectOptional"auto" (default), "none", or { type: "function", function: { name } } to force a specific call.
response_formatobjectOptional{ type: "json_object" } forces the model to emit valid JSON. Ignored for models that do not support it.
reasoning_effortstringOptionalOpenAI o1/o3-style reasoning depth: "low" | "medium" | "high". See "Reasoning & thinking".
thinkingstring | objectOptionalCross-provider thinking switch. "on" | "off" | "auto", or Anthropic-shape { type: "enabled", budget_tokens: N }. See "Reasoning & thinking".
thinking_budgetintegerOptionalToken cap for the model's reasoning trace (when the provider exposes one).
ignore_defaultsbooleanOptionalSkip the user's saved per-model default parameters (configured in dashboard) for this request.

Basic example

curl https://api.airforce/v1/chat/completions \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.1-chat",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is the capital of France?"}
    ],
    "max_tokens": 200,
    "temperature": 0.7
  }'

Response shape

ParameterTypeRequiredDescription
idstringOptionalStable completion ID, e.g. "chatcmpl-abc123".
objectstringOptional"chat.completion" for non-streamed, "chat.completion.chunk" for streamed.
createdintegerOptionalUnix timestamp (seconds).
modelstringOptionalEcho of the requested model ID.
choicesarrayOptionalArray of completion candidates: [{ index, message: { role, content, tool_calls? }, finish_reason }].
choices[].finish_reasonstringOptional"stop" | "length" | "tool_calls" | "content_filter".
usageobjectOptional{ prompt_tokens, completion_tokens, total_tokens, completion_tokens_details?, prompt_tokens_details?, cache_creation_input_tokens?, cache_creation? }. completion_tokens_details.reasoning_tokens is set when the model produced a reasoning trace. Cache fields appear when the upstream returned prompt-caching info: prompt_tokens_details.cached_tokens reports cache reads (OpenAI standard), cache_creation_input_tokens aggregates writes, and cache_creation.ephemeral_5m_input_tokens / ephemeral_1h_input_tokens give the TTL split.
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1710000000,
  "model": "gpt-5.1-chat",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "The capital of France is Paris."
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 20,
    "completion_tokens": 8,
    "total_tokens": 28
  }
}

Reasoning & thinking

Models that support extended reasoning expose a thinking trace alongside the regular output. Airforce normalises three different upstream conventions into one set of canonical parameters that work everywhere.

Check supports_reasoning: true on a model in GET /v1/models to know which IDs accept these parameters.

Models with reasoning support

· live

Canonical parameters

ParameterTypeRequiredDescription
reasoning_effortstringOptional"low" | "medium" | "high". OpenAI o1/o3, GPT-5 reasoning models, and any router that maps onto them.
thinkingstring | objectOptional"on" | "off" | "auto" for a quick toggle, or { type: "enabled", budget_tokens: N } for the Anthropic-native shape. Maps to Claude extended thinking, Gemini thinking, and DeepSeek reasoning.
thinking_budgetintegerOptionalMaximum tokens the model may spend reasoning before emitting visible output. Mirrors budget_tokens.

Reasoning effort (OpenAI-style)

curl https://api.airforce/v1/chat/completions \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "o3-mini",
    "messages": [{"role": "user", "content": "Prove the Pythagorean theorem."}],
    "reasoning_effort": "high"
  }'

Extended thinking (Anthropic-style)

curl https://api.airforce/v1/chat/completions \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4.6",
    "messages": [{"role": "user", "content": "Plan a 7-day Italy trip."}],
    "thinking": {"type": "enabled", "budget_tokens": 4000}
  }'

The reasoning trace itself appears in choices[0].message.reasoning_content (OpenAI shape) or as thinking blocks in content (Anthropic shape). Reasoning tokens are billed and reported in usage.completion_tokens_details.reasoning_tokens.


Vision & image input

Models with supports_vision: true accept images embedded as content blocks. Either a public URL or a base64 data URL works; size limits depend on the upstream model.

Models with vision support

· live
curl https://api.airforce/v1/chat/completions \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.1-chat",
    "messages": [{
      "role": "user",
      "content": [
        {"type": "text", "text": "What is in this image?"},
        {"type": "image_url", "image_url": {"url": "https://example.com/cat.jpg"}}
      ]
    }]
  }'

Tool calling

Models with supports_tools: true can call functions you define. The model returns a tool_calls array; you run the call, then send the result back in a tool message.

Models with tool calling support

· live

Request

curl https://api.airforce/v1/chat/completions \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.1-chat",
    "messages": [{"role": "user", "content": "What is the weather in Paris?"}],
    "tools": [{
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get current weather for a location",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {"type": "string", "description": "City name"}
          },
          "required": ["location"]
        }
      }
    }],
    "tool_choice": "auto"
  }'

Response with tool call

{
  "id": "chatcmpl-abc123",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": null,
      "tool_calls": [{
        "id": "call_1",
        "type": "function",
        "function": {
          "name": "get_weather",
          "arguments": "{\"location\":\"Paris\"}"
        }
      }]
    },
    "finish_reason": "tool_calls"
  }]
}

Follow-up with tool result

{
  "model": "gpt-5.1-chat",
  "messages": [
    {"role": "user", "content": "What is the weather in Paris?"},
    {
      "role": "assistant",
      "content": null,
      "tool_calls": [{
        "id": "call_1",
        "type": "function",
        "function": {"name": "get_weather", "arguments": "{\"location\":\"Paris\"}"}
      }]
    },
    {"role": "tool", "tool_call_id": "call_1", "content": "{\"temp_c\": 14, \"sky\": \"cloudy\"}"}
  ]
}

Streaming

Set stream: true to receive partial completions as Server-Sent Events. Each event is one JSON chunk with the same shape as the non-streamed response, except message is replaced by delta. The stream ends with data: [DONE].

curl https://api.airforce/v1/chat/completions \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.1-chat",
    "messages": [{"role": "user", "content": "Write a haiku about Berlin."}],
    "stream": true,
    "stream_options": {"include_usage": true}
  }'

Wire format

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1710000000,"model":"gpt-5.1-chat","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1710000000,"model":"gpt-5.1-chat","choices":[{"index":0,"delta":{"content":"Cold "},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1710000000,"model":"gpt-5.1-chat","choices":[{"index":0,"delta":{"content":"stone "},"finish_reason":null}]}


data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1710000000,"model":"gpt-5.1-chat","choices":[{"index":0,"delta":{},"finish_reason":"stop"}],"usage":{"prompt_tokens":12,"completion_tokens":17,"total_tokens":29}}

data: [DONE]

POST /v1/messages

Anthropic-compatible Messages API. Works with the official @anthropic-ai/sdk by setting baseURL to https://api.airforce. Forwards to OpenAI/Google/etc. transparently for non-Claude models.

POSThttps://api.airforce/v1/messages

Request body

ParameterTypeRequiredDescription
modelstringRequiredModel ID (Anthropic-format or routed alias).
messagesarrayRequiredEach entry: { role: "user" | "assistant", content: string | array }.
max_tokensintegerRequiredRequired by Anthropic. Token cap for the response.
systemstring | arrayOptionalSystem prompt. Pass an array of { type: "text", text, cache_control? } blocks to mark cached prefix segments. See "Prompt caching".
temperaturefloatOptional0–1.
top_pfloatOptionalNucleus sampling.
top_kintegerOptionalLimit sampling pool to top-K tokens.
stop_sequencesarrayOptionalUp to 4 stop sequences.
streambooleanOptionalWhen true, emits Anthropic-style SSE event stream (see "Streaming").
toolsarrayOptionalAnthropic tool definitions: { name, description, input_schema }. The response may contain tool_use content blocks.
tool_choiceobjectOptional{ type: "auto" | "any" | "tool", name? }.
thinkingobjectOptionalAnthropic extended thinking: { type: "enabled", budget_tokens: N }.

Example

curl https://api.airforce/v1/messages \
  -H "x-api-key: sk-air-YOUR_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4.6",
    "max_tokens": 256,
    "system": "You are a helpful assistant.",
    "messages": [
      {"role": "user", "content": "Hello, Claude!"}
    ]
  }'

Response shape

ParameterTypeRequiredDescription
idstringOptionalMessage ID, e.g. "msg_01ABCxyz".
typestringOptionalAlways "message".
rolestringOptionalAlways "assistant".
contentarrayOptionalArray of content blocks: { type: "text" | "tool_use" | "thinking", … }.
modelstringOptionalEcho of requested model.
stop_reasonstringOptional"end_turn" | "max_tokens" | "stop_sequence" | "tool_use".
usageobjectOptional{ input_tokens, output_tokens, cache_read_input_tokens?, cache_creation_input_tokens?, cache_creation? }. Cache fields appear when prompt caching was used. cache_creation.ephemeral_5m_input_tokens and ephemeral_1h_input_tokens give the per-TTL write breakdown.

Streaming events

Anthropic SSE uses named events instead of one-off JSON chunks. Each event has both an event: name and a data: JSON payload.

event: message_start
data: {"type":"message_start","message":{"id":"msg_01","role":"assistant","content":[],"model":"claude-sonnet-4.6","stop_reason":null,"usage":{"input_tokens":12,"output_tokens":1}}}

event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"Hello"}}

event: content_block_stop
data: {"type":"content_block_stop","index":0}

event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"end_turn"},"usage":{"output_tokens":17}}

event: message_stop
data: {"type":"message_stop"}

Prompt caching

On /v1/messages with Claude models, mark a prefix as cached by passing system as an array of blocks where the cached segment carries cache_control: { type: "ephemeral" }. Subsequent requests that begin with the same prefix charge the cheaper cache-read rate. Models with supports_caching: true in /v1/models support this.

Models with prompt caching

· live
{
  "model": "claude-sonnet-4.6",
  "max_tokens": 1024,
  "system": [
    {"type": "text", "text": "You are a senior staff engineer at Airforce."},
    {
      "type": "text",
      "text": "<repository-snapshot>...</repository-snapshot>",
      "cache_control": {"type": "ephemeral"}
    }
  ],
  "messages": [
    {"role": "user", "content": "Where is rate limiting enforced?"}
  ]
}

How cache counts are reported in the response

Cache token counts are passed through in each format's native shape, so SDKs (openai, @anthropic-ai/sdk, @google/genai) read them without custom code. Fields are omitted when the value is zero, keeping non-cached responses lean.

/v1/chat/completions (OpenAI shape)

"usage": {
  "prompt_tokens": 2104,
  "completion_tokens": 147,
  "total_tokens": 2251,
  "prompt_tokens_details": { "cached_tokens": 1980 },
  "cache_creation_input_tokens": 124,
  "cache_creation": {
    "ephemeral_5m_input_tokens": 124,
    "ephemeral_1h_input_tokens": 0
  }
}

/v1/messages (Anthropic shape)

"usage": {
  "input_tokens": 2104,
  "output_tokens": 147,
  "cache_read_input_tokens": 1980,
  "cache_creation_input_tokens": 124,
  "cache_creation": {
    "ephemeral_5m_input_tokens": 124,
    "ephemeral_1h_input_tokens": 0
  }
}

/v1beta/.../generateContent (Gemini shape)

"usageMetadata": {
  "promptTokenCount": 2104,
  "candidatesTokenCount": 147,
  "totalTokenCount": 2251,
  "cachedContentTokenCount": 1980
}

Where caching applies

Explicit cache_control markers are honored on /v1/messages and /v1/chat/completions for Claude models — put them on system or message content blocks. Many other providers (OpenAI-family, DeepSeek, Gemini) cache automatically: you send no markers and simply see cached_tokens in the response once a long-enough prefix is reused.

Cache duration: 5 minutes or 1 hour

A cached prefix lives for 5 minutes by default and the timer refreshes on every hit. For a longer-lived prefix, add ttl: "1h" to the marker. The response reports each TTL separately under cache_creation.

"cache_control": { "type": "ephemeral", "ttl": "1h" }

Worked example: first write, then read

Send the exact same request twice (the caching example above). The first call that sees the prefix pays a one-time cache write; identical calls within the TTL pay the much cheaper cache read.

First call — cache write (usage excerpt):

"usage": {
  "input_tokens": 2104,
  "output_tokens": 12,
  "cache_creation_input_tokens": 1980,
  "cache_read_input_tokens": 0
}

Second identical call within the TTL — cache read:

"usage": {
  "input_tokens": 2104,
  "output_tokens": 12,
  "cache_creation_input_tokens": 0,
  "cache_read_input_tokens": 1980
}

Limits & cost

  • Claude requires a minimum cacheable prefix (about 1024 tokens; larger for some models). Shorter prefixes are simply not cached.
  • Up to 4 cache breakpoints per request, and the cached prefix must be byte-identical across calls — even a one-character change misses the cache.
  • Cache writes cost more than normal input (5m ≈ 1.25×, 1h ≈ 2×); cache reads cost much less (≈ 0.1×). See each model's cache prices on the pricing page.

POST /v1/responses

OpenAI Responses-API surface for stateful conversations. Same Bearer/x-api-key auth. Cache counts surface as input_tokens_details.cached_tokens (read) plus the flat cache_creation_input_tokens + cache_creation.ephemeral_* (writes) for parity with /v1/chat/completions.

POSThttps://api.airforce/v1/responses

Errors

Airforce returns standard HTTP status codes and a uniform error envelope for both endpoints.

ParameterTypeRequiredDescription
400invalid_requestOptionalMalformed JSON, missing required field, unknown model.
401authentication_errorOptionalMissing or invalid API key.
403permission_errorOptionalPlan or per-key permissions deny this request.
429rate_limitOptionalRequest rate or daily token cap exceeded.
503upstream_errorOptionalAll upstream keys for the requested provider failed.
{
  "error": {
    "message": "Model 'gpt-99' not found.",
    "type": "invalid_request",
    "param": "model",
    "code": "model_not_found"
  }
}

Discover models

See the full list of model IDs and their capability flags (vision, tools, reasoning, caching, context length, …) at /docs/api/models.

curl https://api.airforce/v1/models \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY"