Api.Airforce
API REFERENCE

聊天完成情况

通过一个 API 在 100 多个模型上生成聊天响应。可直接兼容 OpenAI Chat Completions、Anthropic Messages 和 Anthropic Responses。

Airforce 在同一组模型上同时支持 OpenAI Chat Completions 和 Anthropic Messages 两种 wire 格式。选用你已经在用的 SDK,只需更换 base URL —— 非 Claude 模型会在两种接口下被透明转发。

本页介绍认证、两种接口的请求与响应结构、streaming、tool calling、vision、reasoning 以及 prompt caching。初次使用?先从下面的基础示例开始,让一次调用跑通,再在此基础上叠加 streaming、tools 或 caching。

验证

每个请求都需要一个 Bearer 令牌(您的 Airforce API 密钥)。Anthropic 的 x-api-key 标头也被接受 /v1/messages 用于 SDK 兼容性。

Authorization: Bearer sk-air-YOUR_API_KEY
# alt for /v1/messages:
x-api-key: sk-air-YOUR_API_KEY

POST /v1/chat/completions

OpenAI 兼容的聊天完成。与官方合作 openai SDK通过覆盖 base_url https://api.airforce/v1.

POSThttps://api.airforce/v1/chat/completions

请求正文

ParameterTypeRequiredDescription
modelstringRequired型号 ID。使用 GET /v1/models 发现可用的 ID。
messagesarrayRequired对话历史记录。每个条目都有 { role: "system" | "user" | "assistant" | "tool", content }。content 是一个字符串,或一个内容块数组(用于视觉/图像输入,见下文)。
max_tokensintegerOptional生成的最大令牌数。上限为模型的 max_output_tokens。
temperaturefloatOptional采样温度,0–2。越低则更具确定性。默认值取决于上游提供商。
top_pfloatOptional核采样(Nucleus sampling)。使用温度或 top_p 之一,不要同时使用。
streambooleanOptional当为 true 时,响应是服务器发送的事件流。请参阅下面的“流式传输”。
modelsarrayOptionalFallback models (max 3), e.g. ["deepseek-v3.2", "gpt-4o-mini"]. If every channel of the primary model fails, each candidate is tried in order. You are billed for — and response.model reports — the model that actually answered. Unknown or plan-gated candidates are skipped. With the OpenAI SDK pass it via extra_body.
transformsarrayOptionalPrompt transforms. Supported: ["middle-out"] — when the conversation overflows the model's context window, whole messages are dropped from the middle (system prompts, the first message and the most recent turns are kept), so long roleplay or agent histories keep working instead of erroring. Opt-in; off by default.
stream_optionsobjectOptional{ include_usage: boolean }。用量始终包含在最后一个流式分块中;此字段为兼容 OpenAI 而被接受,但无法将其关闭。
stopstring | arrayOptional最多 4 个停止序列。一旦生成出其中之一,生成就会停止。
toolsarrayOptional模型可能调用的函数定义。请参阅下面的“工具调用”。
tool_choicestring | objectOptional"auto"(默认)、"none",或 { type: "function", function: { name } } 以强制执行特定调用。
response_formatobjectOptional{ type: "json_object" } 强制模型发出有效的 JSON。不支持的型号将被忽略。
reasoning_effortstringOptionalOpenAI o1/o3 风格的推理深度:"low" | "medium" | "high"。参见“推理与思考”。
thinkingstring | objectOptional跨提供商的思维开关。"on" | "off" | "auto",或 Anthropic 形式的 { type: "enabled", budget_tokens: N }。参见“推理与思考”。
thinking_budgetintegerOptional模型推理跟踪的令牌上限(当提供者公开时)。
ignore_defaultsbooleanOptional跳过用户为此请求保存的每个模型的默认参数(在仪表板中配置)。

基本示例

curl https://api.airforce/v1/chat/completions \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.1-chat",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is the capital of France?"}
    ],
    "max_tokens": 200,
    "temperature": 0.7
  }'

响应形状

ParameterTypeRequiredDescription
idstringOptional稳定的完成 ID,例如“chatcmpl-abc123”。
objectstringOptional“chat.completion”用于非流式传输,“chat.completion.chunk”用于流式传输。
createdintegerOptionalUnix 时间戳(秒)。
modelstringOptional回显所请求的型号 ID。
choicesarrayOptional补全候选项数组:[{ index, message: { role, content, tool_calls? }, finish_reason }]。
choices[].finish_reasonstringOptional"stop" | "length" | "tool_calls" | "content_filter"。
usageobjectOptional{ prompt_tokens, completion_tokens, total_tokens, completion_tokens_details?, prompt_tokens_details?, cache_creation_input_tokens?, cache_creation? }。当模型生成推理痕迹时设置 completion_tokens_details.reasoning_tokens。当上游返回提示缓存信息时会出现缓存字段:prompt_tokens_details.cached_tokens 报告缓存读取(OpenAI 标准),cache_creation_input_tokens 聚合写入,cache_creation.ephemeral_5m_input_tokens / ephemeral_1h_input_tokens 提供 TTL 拆分。
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1710000000,
  "model": "gpt-5.1-chat",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "The capital of France is Paris."
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 20,
    "completion_tokens": 8,
    "total_tokens": 28
  }
}

推理与思考

支持扩展推理的模型会在常规输出旁边公开思维轨迹。Airforce 将三种不同的上游约定标准化为一组适用于任何地方的规范参数。

查看 supports_reasoning: true 在模型上 GET /v1/models 了解哪些 ID 接受这些参数。

具有推理支持的模型

· live

规范参数

ParameterTypeRequiredDescription
reasoning_effortstringOptional"low" | "medium" | "high"。OpenAI o1/o3、GPT-5 推理模型,以及任何映射到它们的路由器。
thinkingstring | objectOptional"on" | "off" | "auto" 用于快速切换,或使用 { type: "enabled", budget_tokens: N } 这种 Anthropic 原生形式。映射到 Claude 扩展思维、Gemini 思维以及 DeepSeek 推理。
thinking_budgetintegerOptional模型在产生可见输出之前可用于推理的最大令牌数。对应 budget_tokens。

推理工作(OpenAI 风格)

curl https://api.airforce/v1/chat/completions \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "o3-mini",
    "messages": [{"role": "user", "content": "Prove the Pythagorean theorem."}],
    "reasoning_effort": "high"
  }'

扩展思维(Anthropic 风格)

curl https://api.airforce/v1/chat/completions \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4.6",
    "messages": [{"role": "user", "content": "Plan a 7-day Italy trip."}],
    "thinking": {"type": "enabled", "budget_tokens": 4000}
  }'

推理痕迹本身出现在 choices[0].message.reasoning (OpenAI 形状)或作为 thinking 阻塞在 content (Anthropic 形式)。推理令牌会被计费并报告在 usage.completion_tokens_details.reasoning_tokens.

仅当上游提供方上报时,才会出现 completion_tokens_details.reasoning_tokens 这一明细。在流式响应中,该追踪信息会按每个 chunk 通过 delta.reasoning_content 到达。


视觉与图像输入

型号有 supports_vision: true 接受作为内容块嵌入的图像。公共 URL 或 Base64 数据 URL 均可;大小限制取决于上游模型。

具有视觉支持的型号

· live
curl https://api.airforce/v1/chat/completions \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.1-chat",
    "messages": [{
      "role": "user",
      "content": [
        {"type": "text", "text": "What is in this image?"},
        {"type": "image_url", "image_url": {"url": "https://example.com/cat.jpg"}}
      ]
    }]
  }'

工具调用

型号有 supports_tools: true 可以调用您定义的函数。该模型返回一个 tool_calls 数组;您运行该调用,然后将结果发送回一个 tool 信息。

支持工具调用的型号

· live

要求

curl https://api.airforce/v1/chat/completions \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.1-chat",
    "messages": [{"role": "user", "content": "What is the weather in Paris?"}],
    "tools": [{
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get current weather for a location",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {"type": "string", "description": "City name"}
          },
          "required": ["location"]
        }
      }
    }],
    "tool_choice": "auto"
  }'

通过工具调用进行响应

{
  "id": "chatcmpl-abc123",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": null,
      "tool_calls": [{
        "id": "call_1",
        "type": "function",
        "function": {
          "name": "get_weather",
          "arguments": "{\"location\":\"Paris\"}"
        }
      }]
    },
    "finish_reason": "tool_calls"
  }]
}

跟进工具结果

{
  "model": "gpt-5.1-chat",
  "messages": [
    {"role": "user", "content": "What is the weather in Paris?"},
    {
      "role": "assistant",
      "content": null,
      "tool_calls": [{
        "id": "call_1",
        "type": "function",
        "function": {"name": "get_weather", "arguments": "{\"location\":\"Paris\"}"}
      }]
    },
    {"role": "tool", "tool_call_id": "call_1", "content": "{\"temp_c\": 14, \"sky\": \"cloudy\"}"}
  ]
}

Structured outputs

Set response_format to make the model return JSON. Two modes are supported:

  • { "type": "json_object" } — the response is a single valid JSON value.
  • { "type": "json_schema", "json_schema": { "name", "schema", "strict" } } — the model is steered to produce JSON that matches your JSON Schema.
curl https://api.airforce/v1/chat/completions \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.1-chat",
    "messages": [{"role": "user", "content": "Extract the city and country: I live in Paris, France."}],
    "response_format": {
      "type": "json_schema",
      "json_schema": {
        "name": "location",
        "schema": {
          "type": "object",
          "properties": { "city": {"type": "string"}, "country": {"type": "string"} },
          "required": ["city", "country"]
        }
      }
    }
  }'

Reliability: even when a model wraps its answer in prose or a markdown code fence, Airforce extracts the JSON payload so you always receive parseable content. If no valid JSON can be recovered, the original text is returned unchanged — so the guarantee never makes a response worse. This applies to non-streamed responses; streamed responses are passed through unchanged.


流媒体

stream: true 接收部分完成作为服务器发送的事件。每个事件都是一个 JSON 块,其形状与非流式响应相同,除了 message 被替换为 delta. 流结束于 data: [DONE].

curl https://api.airforce/v1/chat/completions \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.1-chat",
    "messages": [{"role": "user", "content": "Write a haiku about Berlin."}],
    "stream": true,
    "stream_options": {"include_usage": true}
  }'

接线格式

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1710000000,"model":"gpt-5.1-chat","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1710000000,"model":"gpt-5.1-chat","choices":[{"index":0,"delta":{"content":"Cold "},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1710000000,"model":"gpt-5.1-chat","choices":[{"index":0,"delta":{"content":"stone "},"finish_reason":null}]}


data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1710000000,"model":"gpt-5.1-chat","choices":[{"index":0,"delta":{},"finish_reason":"stop"}],"usage":{"prompt_tokens":12,"completion_tokens":17,"total_tokens":29}}

data: [DONE]

POST /v1/messages

兼容 Anthropic 的 Messages API。可与官方 @anthropic-ai/sdk 通过设置 baseURL https://api.airforce. 对于非 Claude 模型,会透明地转发到 OpenAI/Google 等。

POSThttps://api.airforce/v1/messages

请求正文

ParameterTypeRequiredDescription
modelstringRequired模型 ID(Anthropic 格式或路由别名)。
messagesarrayRequired每个条目:{ role: "user" | "assistant", content: string | array }。
max_tokensintegerRequiredAnthropic 要求提供。响应的令牌上限。
systemstring | arrayOptional系统提示。传入一个由 { type: "text", text, cache_control? } 块组成的数组,以标记需缓存的前缀段。请参阅“提示缓存”。
temperaturefloatOptional0–1。
top_pfloatOptional核采样(Nucleus sampling)。
top_kintegerOptional将采样池限制为前 K 个令牌。
stop_sequencesarrayOptional最多 4 个停止序列。
streambooleanOptional如果为 true,则发出 Anthropic 风格的 SSE 事件流(请参阅“流”)。
fallbacksarrayOptionalFallback models (max 3) in Anthropic form: [{"model": "gpt-4o-mini"}]. If every channel of the primary model fails, each candidate is tried in order; you are billed for — and the response model field reports — the model that actually answered. A plain models string array is accepted too.
toolsarrayOptionalAnthropic 工具定义:{ name, description, input_schema }。响应可能包含 tool_use 内容块。
tool_choiceobjectOptional{ type: "auto" | "any" | "tool", name? }。
thinkingobjectOptionalAnthropic 扩展思维:{ type: "enabled", budget_tokens: N }。

例子

curl https://api.airforce/v1/messages \
  -H "x-api-key: sk-air-YOUR_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4.6",
    "max_tokens": 256,
    "system": "You are a helpful assistant.",
    "messages": [
      {"role": "user", "content": "Hello, Claude!"}
    ]
  }'

响应形状

ParameterTypeRequiredDescription
idstringOptional消息 ID,例如“msg_01ABCxyz”。
typestringOptional总是“消息”。
rolestringOptional永远是“助手”。
contentarrayOptional内容块数组:{ type: "text" | "tool_use" | "thinking", … }。
modelstringOptional回显所请求的模型。
stop_reasonstringOptional"end_turn" | "max_tokens" | "stop_sequence" | "tool_use"。
usageobjectOptional{ input_tokens, output_tokens, cache_read_input_tokens?, cache_creation_input_tokens?, cache_creation? }。当使用了提示缓存时会出现缓存字段。cache_creation.ephemeral_5m_input_tokens 和 ephemeral_1h_input_tokens 提供按 TTL 的写入拆分。

流式事件

Anthropic SSE 使用命名事件而不是一次性 JSON 块。每个事件都有一个 event: 名称和一个 data: JSON 有效负载。

event: message_start
data: {"type":"message_start","message":{"id":"msg_01","role":"assistant","content":[],"model":"claude-sonnet-4.6","stop_reason":null,"usage":{"input_tokens":12,"output_tokens":1}}}

event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"Hello"}}

event: content_block_stop
data: {"type":"content_block_stop","index":0}

event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"end_turn"},"usage":{"output_tokens":17}}

event: message_stop
data: {"type":"message_stop"}

POST /v1/messages/count_tokens

Anthropic-compatible token counting. Send the same system / messages / tools you would pass to /v1/messages and get an input-token estimate back without running the model — nothing is billed.

POSThttps://api.airforce/v1/messages/count_tokens
curl https://api.airforce/v1/messages/count_tokens \
  -H "x-api-key: sk-air-YOUR_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4.6",
    "system": "You are a helpful assistant.",
    "messages": [{"role": "user", "content": "Hello, Claude!"}]
  }'

# → {"input_tokens": 34}

The count is a fast character-based estimate (about 4 characters per token) over system, messages and tools — close enough for context-budget checks, not an exact tokenizer run.


提示缓存

/v1/messages 对于 Claude 模型,通过传递将前缀标记为缓存 system 作为缓存段携带的块数组 cache_control: { type: "ephemeral" }. 以相同前缀开头的后续请求将收取更便宜的缓存读取速率。型号有 supports_caching: true /v1/models 支持这一点。

具有提示缓存的模型

· live
{
  "model": "claude-sonnet-4.6",
  "max_tokens": 1024,
  "system": [
    {"type": "text", "text": "You are a senior staff engineer at Airforce."},
    {
      "type": "text",
      "text": "<repository-snapshot>...</repository-snapshot>",
      "cache_control": {"type": "ephemeral"}
    }
  ],
  "messages": [
    {"role": "user", "content": "Where is rate limiting enforced?"}
  ]
}

缓存计数在响应中的报告方式

缓存令牌计数以每种格式的原生形状传递,因此 SDK (openai、@anthropic-ai/sdk、@google/genai) 无需自定义代码即可读取。当值为零时省略字段,使未缓存的响应保持精简。

/v1/chat/completions (OpenAI 格式)

"usage": {
  "prompt_tokens": 2104,
  "completion_tokens": 147,
  "total_tokens": 2251,
  "prompt_tokens_details": { "cached_tokens": 1980 },
  "cache_creation_input_tokens": 124,
  "cache_creation": {
    "ephemeral_5m_input_tokens": 124,
    "ephemeral_1h_input_tokens": 0
  }
}

/v1/messages (Anthropic 格式)

"usage": {
  "input_tokens": 2104,
  "output_tokens": 147,
  "cache_read_input_tokens": 1980,
  "cache_creation_input_tokens": 124,
  "cache_creation": {
    "ephemeral_5m_input_tokens": 124,
    "ephemeral_1h_input_tokens": 0
  }
}

/v1beta/.../generateContent (Gemini 格式)

"usageMetadata": {
  "promptTokenCount": 2104,
  "candidatesTokenCount": 147,
  "totalTokenCount": 2251,
  "cachedContentTokenCount": 1980
}

缓存在哪些情况下生效

对于 Claude 模型,显式的 cache_control 标记在 /v1/messages 和 /v1/chat/completions 上均生效——把它们放在 system 或 message 的内容块上。许多其他提供商(OpenAI 系、DeepSeek、Gemini)会自动缓存:你无需发送任何标记,只要重用足够长的前缀,响应中就会出现 cached_tokens。

缓存时长:5 分钟或 1 小时

缓存的前缀默认存活 5 分钟,每次命中都会刷新计时。若需更长存活的前缀,请在标记中加入 ttl: "1h" 。响应会在 cache_creation 下分别报告每种 TTL。

"cache_control": { "type": "ephemeral", "ttl": "1h" }

示例:先写入,再读取

把完全相同的请求发送两次(上面的缓存示例)。第一次看到该前缀的调用支付一次性的缓存写入;TTL 内相同的调用支付便宜得多的缓存读取。

第一次调用——缓存写入(usage 摘录):

"usage": {
  "input_tokens": 2104,
  "output_tokens": 12,
  "cache_creation_input_tokens": 1980,
  "cache_read_input_tokens": 0
}

TTL 内第二次相同调用——缓存读取:

"usage": {
  "input_tokens": 2104,
  "output_tokens": 12,
  "cache_creation_input_tokens": 0,
  "cache_read_input_tokens": 1980
}

限制与费用

  • Claude 要求最小可缓存前缀(约 1024 个 token;某些模型更大)。更短的前缀根本不会被缓存。
  • 每个请求最多 4 个缓存断点,且缓存的前缀在多次调用间必须逐字节相同——哪怕改动一个字符也会错过缓存。
  • 缓存写入比普通输入更贵(5m ≈ 1.25×,1h ≈ 2×);读取便宜得多(≈ 0.1×)。各模型的缓存价格见定价页面。

POST /v1/responses

用于有状态对话的 OpenAI Responses-API 表面。相同的 Bearer/x-api-key 认证。缓存计数显示为 input_tokens_details.cached_tokens(读取)加上平面的 cache_creation_input_tokens + cache_creation.ephemeral_*(写入),与 /v1/chat/completions 对等。

POSThttps://api.airforce/v1/responses

POST /v1beta/models/{model}:generateContent

Google Gemini-compatible endpoint. Works with the official @google/genai SDK and the Gemini CLI by pointing the base URL at https://api.airforce/v1beta. Any routed model works — requests are translated to and from the native Gemini shape, and the model is taken from the URL path (not the body).

POSThttps://api.airforce/v1beta/models/{model}:generateContent

Authentication

Pass your Airforce API key any of the three ways Google clients use:

# 1) query parameter (Google default)
?key=sk-air-YOUR_API_KEY

# 2) header
x-goog-api-key: sk-air-YOUR_API_KEY

# 3) bearer token
Authorization: Bearer sk-air-YOUR_API_KEY

Request body

ParameterTypeRequiredDescription
contentsarrayRequiredConversation turns. Each: { role: "user" | "model", parts: [...] }. A part is { text }, { functionCall: { name, args } }, or { functionResponse: { name, response } }. "model" is Gemini's term for the assistant role.
systemInstructionobjectOptionalSystem prompt: { parts: [{ text }] }.
generationConfigobjectOptional{ temperature, maxOutputTokens, topP, stopSequences } — mapped to the canonical sampling parameters.
toolsarrayOptionalTool definitions: [{ functionDeclarations: [{ name, description, parameters }] }]. functionDeclarations are flattened across entries.
toolConfigobjectOptionalTool-choice control: { functionCallingConfig: { mode: "AUTO" | "ANY" | "NONE" } }. ANY forces a call, NONE disables tools.

Example

curl "https://api.airforce/v1beta/models/gemini-3.1-pro:generateContent" \
  -H "x-goog-api-key: sk-air-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [
      {"role": "user", "parts": [{"text": "What is the capital of France?"}]}
    ],
    "systemInstruction": {"parts": [{"text": "You are a helpful assistant."}]},
    "generationConfig": {"temperature": 0.7, "maxOutputTokens": 256}
  }'

Response shape

ParameterTypeRequiredDescription
candidatesarrayOptionalGenerated turns: [{ content: { role: "model", parts }, finishReason, index }]. Only the first candidate is populated.
candidates[].finishReasonstringOptional"STOP" | "MAX_TOKENS" | "SAFETY" | "OTHER".
usageMetadataobjectOptional{ promptTokenCount, candidatesTokenCount, totalTokenCount, cachedContentTokenCount? }. cachedContentTokenCount appears when the upstream reported a cache read.
modelVersionstringOptionalEcho of the requested model.
{
  "candidates": [{
    "content": {
      "role": "model",
      "parts": [{"text": "The capital of France is Paris."}]
    },
    "finishReason": "STOP",
    "index": 0
  }],
  "usageMetadata": {
    "promptTokenCount": 16,
    "candidatesTokenCount": 8,
    "totalTokenCount": 24
  },
  "modelVersion": "gemini-3.1-pro"
}

POST /v1beta/models/{model}:streamGenerateContent

Streaming uses the :streamGenerateContent action and returns Server-Sent Events. Each data: line is a full Gemini-shaped chunk (not a delta object); the final chunk carries usageMetadata.

data: {"candidates":[{"content":{"role":"model","parts":[{"text":"The capital"}]},"index":0}],"modelVersion":"gemini-3.1-pro"}

data: {"candidates":[{"content":{"role":"model","parts":[{"text":" is Paris."}]},"index":0}],"modelVersion":"gemini-3.1-pro"}

data: {"candidates":[{"content":{"role":"model","parts":[]},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":16,"candidatesTokenCount":8,"totalTokenCount":24}}

List models

The catalog is also exposed in Gemini Model-resource shape so Google clients can enumerate models.

curl https://api.airforce/v1beta/models

Notes: the base URL is https://api.airforce/v1beta (or /v1), not Google's host. The model name comes from the URL path, not the request body. Only the first candidate is returned, and a subset of Gemini fields is translated — safetySettings and cachedContent are currently ignored. Billing, rate limits and smart routing apply exactly as on /v1/chat/completions.


错误

Airforce 为两个端点返回标准 HTTP 状态代码和统一的错误信封。

ParameterTypeRequiredDescription
400invalid_request_errorOptionalJSON 格式错误、缺少必填字段、未知型号。
401invalid_request_error / auth_requiredOptionalAPI 密钥缺失或无效。
402insufficient_quotaOptional该模型需要有效的订阅或正的 Pay-as-you-Go 余额。
403model_access_denied / insufficient_scopeOptional计划或每键权限拒绝此请求。
404model_not_foundOptional请求的模型不存在或你无权访问。
429rate_limit_errorOptional超出请求速率或每日令牌上限。
503api_error / moderation_unavailableOptional所请求的提供程序的所有上游密钥均失败。
{
  "error": {
    "message": "The requested model does not exist or you do not have access to it.",
    "type": "model_not_found",
    "param": null,
    "code": "404"
  }
}

描述性标识位于 type 中。code 是以字符串表示的 HTTP 状态(例如 "404"),而 param 通常为 null,仅在参数范围校验错误时例外,此时它会指明出错的参数。

探索型号

请参阅模型 ID 及其功能标志(视觉、工具、推理、缓存、上下文长度等)的完整列表: /docs/api/models.

curl https://api.airforce/v1/models \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY"