Api.Airforce
API REFERENCE

Chat completions

Generate chat responses across 100+ models from one API. Drop-in compatible with OpenAI Chat Completions, Anthropic Messages, and Anthropic Responses.

Airforce speaks both the OpenAI Chat Completions and the Anthropic Messages wire formats over the same set of models. Pick whichever SDK you already use and only change the base URL — non-Claude models are forwarded transparently behind either surface.

This page covers authentication, the request and response shapes for both surfaces, streaming, tool calling, vision, reasoning, and prompt caching. New here? Start with the basic example below, get one call working, then layer on streaming, tools or caching once it does.

Authentication

Every request needs a Bearer token (your Airforce API key). The Anthropic x-api-key header is also accepted on /v1/messages for SDK compatibility.

Authorization: Bearer sk-air-YOUR_API_KEY
# alt for /v1/messages:
x-api-key: sk-air-YOUR_API_KEY

POST /v1/chat/completions

OpenAI-compatible Chat Completions. Works with the official openai SDK by overriding base_url to https://api.airforce/v1.

POSThttps://api.airforce/v1/chat/completions

Request body

ParameterTypeRequiredDescription
modelstringRequiredModel ID. Use GET /v1/models to discover available IDs.
messagesarrayRequiredConversation history. Each entry has { role: "system" | "user" | "assistant" | "tool", content }. Content is a string or an array of content blocks (vision, see below).
max_tokensintegerOptionalMaximum number of tokens to generate. Capped at the model's max_output_tokens.
temperaturefloatOptionalSampling temperature, 0–2. Lower is more deterministic. Default depends on the upstream provider.
top_pfloatOptionalNucleus sampling. Use either temperature or top_p, not both.
streambooleanOptionalWhen true, response is a stream of Server-Sent Events. See "Streaming" below.
modelsarrayOptionalFallback models (max 3), e.g. ["deepseek-v3.2", "gpt-4o-mini"]. If every channel of the primary model fails, each candidate is tried in order. You are billed for — and response.model reports — the model that actually answered. Unknown or plan-gated candidates are skipped. With the OpenAI SDK pass it via extra_body.
transformsarrayOptionalPrompt transforms. Supported: ["middle-out"] — when the conversation overflows the model's context window, whole messages are dropped from the middle (system prompts, the first message and the most recent turns are kept), so long roleplay or agent histories keep working instead of erroring. Opt-in; off by default.
stream_optionsobjectOptional{ include_usage: boolean }. Usage is always included on the final streaming chunk; this field is accepted for OpenAI compatibility but cannot turn it off.
stopstring | arrayOptionalUp to 4 stop sequences. Generation halts as soon as one is produced.
toolsarrayOptionalFunction definitions the model may call. See "Tool calling" below.
tool_choicestring | objectOptional"auto" (default), "none", or { type: "function", function: { name } } to force a specific call.
response_formatobjectOptional{ type: "json_object" } forces the model to emit valid JSON. Ignored for models that do not support it.
reasoning_effortstringOptionalOpenAI o1/o3-style reasoning depth: "low" | "medium" | "high". See "Reasoning & thinking".
thinkingstring | objectOptionalCross-provider thinking switch. "on" | "off" | "auto", or Anthropic-shape { type: "enabled", budget_tokens: N }. See "Reasoning & thinking".
thinking_budgetintegerOptionalToken cap for the model's reasoning trace (when the provider exposes one).
ignore_defaultsbooleanOptionalSkip the user's saved per-model default parameters (configured in dashboard) for this request.

Basic example

curl https://api.airforce/v1/chat/completions \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.1-chat",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is the capital of France?"}
    ],
    "max_tokens": 200,
    "temperature": 0.7
  }'

Response shape

ParameterTypeRequiredDescription
idstringOptionalStable completion ID, e.g. "chatcmpl-abc123".
objectstringOptional"chat.completion" for non-streamed, "chat.completion.chunk" for streamed.
createdintegerOptionalUnix timestamp (seconds).
modelstringOptionalEcho of the requested model ID.
choicesarrayOptionalArray of completion candidates: [{ index, message: { role, content, tool_calls? }, finish_reason }].
choices[].finish_reasonstringOptional"stop" | "length" | "tool_calls" | "content_filter".
usageobjectOptional{ prompt_tokens, completion_tokens, total_tokens, completion_tokens_details?, prompt_tokens_details?, cache_creation_input_tokens?, cache_creation? }. completion_tokens_details.reasoning_tokens is set when the model produced a reasoning trace. Cache fields appear when the upstream returned prompt-caching info: prompt_tokens_details.cached_tokens reports cache reads (OpenAI standard), cache_creation_input_tokens aggregates writes, and cache_creation.ephemeral_5m_input_tokens / ephemeral_1h_input_tokens give the TTL split.
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1710000000,
  "model": "gpt-5.1-chat",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "The capital of France is Paris."
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 20,
    "completion_tokens": 8,
    "total_tokens": 28
  }
}

Reasoning & thinking

Models that support extended reasoning expose a thinking trace alongside the regular output. Airforce normalises three different upstream conventions into one set of canonical parameters that work everywhere.

Check supports_reasoning: true on a model in GET /v1/models to know which IDs accept these parameters.

Models with reasoning support

· live

Canonical parameters

ParameterTypeRequiredDescription
reasoning_effortstringOptional"low" | "medium" | "high". OpenAI o1/o3, GPT-5 reasoning models, and any router that maps onto them.
thinkingstring | objectOptional"on" | "off" | "auto" for a quick toggle, or { type: "enabled", budget_tokens: N } for the Anthropic-native shape. Maps to Claude extended thinking, Gemini thinking, and DeepSeek reasoning.
thinking_budgetintegerOptionalMaximum tokens the model may spend reasoning before emitting visible output. Mirrors budget_tokens.

Reasoning effort (OpenAI-style)

curl https://api.airforce/v1/chat/completions \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "o3-mini",
    "messages": [{"role": "user", "content": "Prove the Pythagorean theorem."}],
    "reasoning_effort": "high"
  }'

Extended thinking (Anthropic-style)

curl https://api.airforce/v1/chat/completions \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4.6",
    "messages": [{"role": "user", "content": "Plan a 7-day Italy trip."}],
    "thinking": {"type": "enabled", "budget_tokens": 4000}
  }'

The reasoning trace itself appears in choices[0].message.reasoning (OpenAI shape) or as thinking blocks in content (Anthropic shape). Reasoning tokens are billed and reported in usage.completion_tokens_details.reasoning_tokens.

That completion_tokens_details.reasoning_tokens breakdown is only present when the upstream provider reports it. On a streamed response the trace arrives on delta.reasoning_content per chunk.


Vision & image input

Models with supports_vision: true accept images embedded as content blocks. Either a public URL or a base64 data URL works; size limits depend on the upstream model.

Models with vision support

· live
curl https://api.airforce/v1/chat/completions \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.1-chat",
    "messages": [{
      "role": "user",
      "content": [
        {"type": "text", "text": "What is in this image?"},
        {"type": "image_url", "image_url": {"url": "https://example.com/cat.jpg"}}
      ]
    }]
  }'

Tool calling

Models with supports_tools: true can call functions you define. The model returns a tool_calls array; you run the call, then send the result back in a tool message.

Models with tool calling support

· live

Request

curl https://api.airforce/v1/chat/completions \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.1-chat",
    "messages": [{"role": "user", "content": "What is the weather in Paris?"}],
    "tools": [{
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get current weather for a location",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {"type": "string", "description": "City name"}
          },
          "required": ["location"]
        }
      }
    }],
    "tool_choice": "auto"
  }'

Response with tool call

{
  "id": "chatcmpl-abc123",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": null,
      "tool_calls": [{
        "id": "call_1",
        "type": "function",
        "function": {
          "name": "get_weather",
          "arguments": "{\"location\":\"Paris\"}"
        }
      }]
    },
    "finish_reason": "tool_calls"
  }]
}

Follow-up with tool result

{
  "model": "gpt-5.1-chat",
  "messages": [
    {"role": "user", "content": "What is the weather in Paris?"},
    {
      "role": "assistant",
      "content": null,
      "tool_calls": [{
        "id": "call_1",
        "type": "function",
        "function": {"name": "get_weather", "arguments": "{\"location\":\"Paris\"}"}
      }]
    },
    {"role": "tool", "tool_call_id": "call_1", "content": "{\"temp_c\": 14, \"sky\": \"cloudy\"}"}
  ]
}

Structured outputs

Set response_format to make the model return JSON. Two modes are supported:

  • { "type": "json_object" } — the response is a single valid JSON value.
  • { "type": "json_schema", "json_schema": { "name", "schema", "strict" } } — the model is steered to produce JSON that matches your JSON Schema.
curl https://api.airforce/v1/chat/completions \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.1-chat",
    "messages": [{"role": "user", "content": "Extract the city and country: I live in Paris, France."}],
    "response_format": {
      "type": "json_schema",
      "json_schema": {
        "name": "location",
        "schema": {
          "type": "object",
          "properties": { "city": {"type": "string"}, "country": {"type": "string"} },
          "required": ["city", "country"]
        }
      }
    }
  }'

Reliability: even when a model wraps its answer in prose or a markdown code fence, Airforce extracts the JSON payload so you always receive parseable content. If no valid JSON can be recovered, the original text is returned unchanged — so the guarantee never makes a response worse. This applies to non-streamed responses; streamed responses are passed through unchanged.


Streaming

Set stream: true to receive partial completions as Server-Sent Events. Each event is one JSON chunk with the same shape as the non-streamed response, except message is replaced by delta. The stream ends with data: [DONE].

curl https://api.airforce/v1/chat/completions \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.1-chat",
    "messages": [{"role": "user", "content": "Write a haiku about Berlin."}],
    "stream": true,
    "stream_options": {"include_usage": true}
  }'

Wire format

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1710000000,"model":"gpt-5.1-chat","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1710000000,"model":"gpt-5.1-chat","choices":[{"index":0,"delta":{"content":"Cold "},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1710000000,"model":"gpt-5.1-chat","choices":[{"index":0,"delta":{"content":"stone "},"finish_reason":null}]}


data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1710000000,"model":"gpt-5.1-chat","choices":[{"index":0,"delta":{},"finish_reason":"stop"}],"usage":{"prompt_tokens":12,"completion_tokens":17,"total_tokens":29}}

data: [DONE]

POST /v1/messages

Anthropic-compatible Messages API. Works with the official @anthropic-ai/sdk by setting baseURL to https://api.airforce. Forwards to OpenAI/Google/etc. transparently for non-Claude models.

POSThttps://api.airforce/v1/messages

Request body

ParameterTypeRequiredDescription
modelstringRequiredModel ID (Anthropic-format or routed alias).
messagesarrayRequiredEach entry: { role: "user" | "assistant", content: string | array }.
max_tokensintegerRequiredRequired by Anthropic. Token cap for the response.
systemstring | arrayOptionalSystem prompt. Pass an array of { type: "text", text, cache_control? } blocks to mark cached prefix segments. See "Prompt caching".
temperaturefloatOptional0–1.
top_pfloatOptionalNucleus sampling.
top_kintegerOptionalLimit sampling pool to top-K tokens.
stop_sequencesarrayOptionalUp to 4 stop sequences.
streambooleanOptionalWhen true, emits Anthropic-style SSE event stream (see "Streaming").
fallbacksarrayOptionalFallback models (max 3) in Anthropic form: [{"model": "gpt-4o-mini"}]. If every channel of the primary model fails, each candidate is tried in order; you are billed for — and the response model field reports — the model that actually answered. A plain models string array is accepted too.
toolsarrayOptionalAnthropic tool definitions: { name, description, input_schema }. The response may contain tool_use content blocks.
tool_choiceobjectOptional{ type: "auto" | "any" | "tool", name? }.
thinkingobjectOptionalAnthropic extended thinking: { type: "enabled", budget_tokens: N }.

Example

curl https://api.airforce/v1/messages \
  -H "x-api-key: sk-air-YOUR_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4.6",
    "max_tokens": 256,
    "system": "You are a helpful assistant.",
    "messages": [
      {"role": "user", "content": "Hello, Claude!"}
    ]
  }'

Response shape

ParameterTypeRequiredDescription
idstringOptionalMessage ID, e.g. "msg_01ABCxyz".
typestringOptionalAlways "message".
rolestringOptionalAlways "assistant".
contentarrayOptionalArray of content blocks: { type: "text" | "tool_use" | "thinking", … }.
modelstringOptionalEcho of requested model.
stop_reasonstringOptional"end_turn" | "max_tokens" | "stop_sequence" | "tool_use".
usageobjectOptional{ input_tokens, output_tokens, cache_read_input_tokens?, cache_creation_input_tokens?, cache_creation? }. Cache fields appear when prompt caching was used. cache_creation.ephemeral_5m_input_tokens and ephemeral_1h_input_tokens give the per-TTL write breakdown.

Streaming events

Anthropic SSE uses named events instead of one-off JSON chunks. Each event has both an event: name and a data: JSON payload.

event: message_start
data: {"type":"message_start","message":{"id":"msg_01","role":"assistant","content":[],"model":"claude-sonnet-4.6","stop_reason":null,"usage":{"input_tokens":12,"output_tokens":1}}}

event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"Hello"}}

event: content_block_stop
data: {"type":"content_block_stop","index":0}

event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"end_turn"},"usage":{"output_tokens":17}}

event: message_stop
data: {"type":"message_stop"}

POST /v1/messages/count_tokens

Anthropic-compatible token counting. Send the same system / messages / tools you would pass to /v1/messages and get an input-token estimate back without running the model — nothing is billed.

POSThttps://api.airforce/v1/messages/count_tokens
curl https://api.airforce/v1/messages/count_tokens \
  -H "x-api-key: sk-air-YOUR_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4.6",
    "system": "You are a helpful assistant.",
    "messages": [{"role": "user", "content": "Hello, Claude!"}]
  }'

# → {"input_tokens": 34}

The count is a fast character-based estimate (about 4 characters per token) over system, messages and tools — close enough for context-budget checks, not an exact tokenizer run.


Prompt caching

On /v1/messages with Claude models, mark a prefix as cached by passing system as an array of blocks where the cached segment carries cache_control: { type: "ephemeral" }. Subsequent requests that begin with the same prefix charge the cheaper cache-read rate. Models with supports_caching: true in /v1/models support this.

Models with prompt caching

· live
{
  "model": "claude-sonnet-4.6",
  "max_tokens": 1024,
  "system": [
    {"type": "text", "text": "You are a senior staff engineer at Airforce."},
    {
      "type": "text",
      "text": "<repository-snapshot>...</repository-snapshot>",
      "cache_control": {"type": "ephemeral"}
    }
  ],
  "messages": [
    {"role": "user", "content": "Where is rate limiting enforced?"}
  ]
}

How cache counts are reported in the response

Cache token counts are passed through in each format's native shape, so SDKs (openai, @anthropic-ai/sdk, @google/genai) read them without custom code. Fields are omitted when the value is zero, keeping non-cached responses lean.

/v1/chat/completions (OpenAI shape)

"usage": {
  "prompt_tokens": 2104,
  "completion_tokens": 147,
  "total_tokens": 2251,
  "prompt_tokens_details": { "cached_tokens": 1980 },
  "cache_creation_input_tokens": 124,
  "cache_creation": {
    "ephemeral_5m_input_tokens": 124,
    "ephemeral_1h_input_tokens": 0
  }
}

/v1/messages (Anthropic shape)

"usage": {
  "input_tokens": 2104,
  "output_tokens": 147,
  "cache_read_input_tokens": 1980,
  "cache_creation_input_tokens": 124,
  "cache_creation": {
    "ephemeral_5m_input_tokens": 124,
    "ephemeral_1h_input_tokens": 0
  }
}

/v1beta/.../generateContent (Gemini shape)

"usageMetadata": {
  "promptTokenCount": 2104,
  "candidatesTokenCount": 147,
  "totalTokenCount": 2251,
  "cachedContentTokenCount": 1980
}

Where caching applies

Explicit cache_control markers are honored on /v1/messages and /v1/chat/completions for Claude models — put them on system or message content blocks. Many other providers (OpenAI-family, DeepSeek, Gemini) cache automatically: you send no markers and simply see cached_tokens in the response once a long-enough prefix is reused.

Cache duration: 5 minutes or 1 hour

A cached prefix lives for 5 minutes by default and the timer refreshes on every hit. For a longer-lived prefix, add ttl: "1h" to the marker. The response reports each TTL separately under cache_creation.

"cache_control": { "type": "ephemeral", "ttl": "1h" }

Worked example: first write, then read

Send the exact same request twice (the caching example above). The first call that sees the prefix pays a one-time cache write; identical calls within the TTL pay the much cheaper cache read.

First call — cache write (usage excerpt):

"usage": {
  "input_tokens": 2104,
  "output_tokens": 12,
  "cache_creation_input_tokens": 1980,
  "cache_read_input_tokens": 0
}

Second identical call within the TTL — cache read:

"usage": {
  "input_tokens": 2104,
  "output_tokens": 12,
  "cache_creation_input_tokens": 0,
  "cache_read_input_tokens": 1980
}

Limits & cost

  • Claude requires a minimum cacheable prefix (about 1024 tokens; larger for some models). Shorter prefixes are simply not cached.
  • Up to 4 cache breakpoints per request, and the cached prefix must be byte-identical across calls — even a one-character change misses the cache.
  • Cache writes cost more than normal input (5m ≈ 1.25×, 1h ≈ 2×); cache reads cost much less (≈ 0.1×). See each model's cache prices on the pricing page.

POST /v1/responses

OpenAI Responses-API surface for stateful conversations. Same Bearer/x-api-key auth. Cache counts surface as input_tokens_details.cached_tokens (read) plus the flat cache_creation_input_tokens + cache_creation.ephemeral_* (writes) for parity with /v1/chat/completions.

POSThttps://api.airforce/v1/responses

POST /v1beta/models/{model}:generateContent

Google Gemini-compatible endpoint. Works with the official @google/genai SDK and the Gemini CLI by pointing the base URL at https://api.airforce/v1beta. Any routed model works — requests are translated to and from the native Gemini shape, and the model is taken from the URL path (not the body).

POSThttps://api.airforce/v1beta/models/{model}:generateContent

Authentication

Pass your Airforce API key any of the three ways Google clients use:

# 1) query parameter (Google default)
?key=sk-air-YOUR_API_KEY

# 2) header
x-goog-api-key: sk-air-YOUR_API_KEY

# 3) bearer token
Authorization: Bearer sk-air-YOUR_API_KEY

Request body

ParameterTypeRequiredDescription
contentsarrayRequiredConversation turns. Each: { role: "user" | "model", parts: [...] }. A part is { text }, { functionCall: { name, args } }, or { functionResponse: { name, response } }. "model" is Gemini's term for the assistant role.
systemInstructionobjectOptionalSystem prompt: { parts: [{ text }] }.
generationConfigobjectOptional{ temperature, maxOutputTokens, topP, stopSequences } — mapped to the canonical sampling parameters.
toolsarrayOptionalTool definitions: [{ functionDeclarations: [{ name, description, parameters }] }]. functionDeclarations are flattened across entries.
toolConfigobjectOptionalTool-choice control: { functionCallingConfig: { mode: "AUTO" | "ANY" | "NONE" } }. ANY forces a call, NONE disables tools.

Example

curl "https://api.airforce/v1beta/models/gemini-3.1-pro:generateContent" \
  -H "x-goog-api-key: sk-air-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [
      {"role": "user", "parts": [{"text": "What is the capital of France?"}]}
    ],
    "systemInstruction": {"parts": [{"text": "You are a helpful assistant."}]},
    "generationConfig": {"temperature": 0.7, "maxOutputTokens": 256}
  }'

Response shape

ParameterTypeRequiredDescription
candidatesarrayOptionalGenerated turns: [{ content: { role: "model", parts }, finishReason, index }]. Only the first candidate is populated.
candidates[].finishReasonstringOptional"STOP" | "MAX_TOKENS" | "SAFETY" | "OTHER".
usageMetadataobjectOptional{ promptTokenCount, candidatesTokenCount, totalTokenCount, cachedContentTokenCount? }. cachedContentTokenCount appears when the upstream reported a cache read.
modelVersionstringOptionalEcho of the requested model.
{
  "candidates": [{
    "content": {
      "role": "model",
      "parts": [{"text": "The capital of France is Paris."}]
    },
    "finishReason": "STOP",
    "index": 0
  }],
  "usageMetadata": {
    "promptTokenCount": 16,
    "candidatesTokenCount": 8,
    "totalTokenCount": 24
  },
  "modelVersion": "gemini-3.1-pro"
}

POST /v1beta/models/{model}:streamGenerateContent

Streaming uses the :streamGenerateContent action and returns Server-Sent Events. Each data: line is a full Gemini-shaped chunk (not a delta object); the final chunk carries usageMetadata.

data: {"candidates":[{"content":{"role":"model","parts":[{"text":"The capital"}]},"index":0}],"modelVersion":"gemini-3.1-pro"}

data: {"candidates":[{"content":{"role":"model","parts":[{"text":" is Paris."}]},"index":0}],"modelVersion":"gemini-3.1-pro"}

data: {"candidates":[{"content":{"role":"model","parts":[]},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":16,"candidatesTokenCount":8,"totalTokenCount":24}}

List models

The catalog is also exposed in Gemini Model-resource shape so Google clients can enumerate models.

curl https://api.airforce/v1beta/models

Notes: the base URL is https://api.airforce/v1beta (or /v1), not Google's host. The model name comes from the URL path, not the request body. Only the first candidate is returned, and a subset of Gemini fields is translated — safetySettings and cachedContent are currently ignored. Billing, rate limits and smart routing apply exactly as on /v1/chat/completions.


Errors

Airforce returns standard HTTP status codes and a uniform error envelope for both endpoints.

ParameterTypeRequiredDescription
400invalid_request_errorOptionalMalformed JSON, missing required field, unknown model.
401invalid_request_error / auth_requiredOptionalMissing or invalid API key.
402insufficient_quotaOptionalThe model needs an active subscription or a positive Pay-as-you-Go balance.
403model_access_denied / insufficient_scopeOptionalPlan or per-key permissions deny this request.
404model_not_foundOptionalThe requested model does not exist or you do not have access to it.
429rate_limit_errorOptionalRequest rate or daily token cap exceeded.
503api_error / moderation_unavailableOptionalAll upstream keys for the requested provider failed.
{
  "error": {
    "message": "The requested model does not exist or you do not have access to it.",
    "type": "model_not_found",
    "param": null,
    "code": "404"
  }
}

The descriptive slug is in type. code is the HTTP status as a string (e.g. "404"), and param is null except on parameter-range validation errors, where it names the offending parameter.

Discover models

See the full list of model IDs and their capability flags (vision, tools, reasoning, caching, context length, …) at /docs/api/models.

curl https://api.airforce/v1/models \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY"