API REFERENCE

Chat completions

Hasilkan respons obrolan di 100+ model dari satu API. Drop-in kompatibel dengan OpenAI Chat Completions, Anthropic Messages, dan Anthropic Responses.

Airforce mendukung kedua format wire OpenAI Chat Completions dan Anthropic Messages di atas kumpulan model yang sama. Pilih SDK yang sudah Anda gunakan dan cukup ubah base URL — model non-Claude diteruskan secara transparan di balik kedua surface.

Halaman ini membahas autentikasi, bentuk request dan response untuk kedua surface, streaming, tool calling, vision, reasoning, dan prompt caching. Baru di sini? Mulai dengan contoh dasar di bawah, jalankan satu call hingga berhasil, lalu tambahkan streaming, tools, atau caching setelahnya.

Otentikasi

Setiap permintaan memerlukan token Bearer (kunci API Airforce Anda). Header Anthropic x-api-key header juga diterima /v1/messages untuk kompatibilitas SDK.

Authorization: Bearer sk-air-YOUR_API_KEY
# alt for /v1/messages:
x-api-key: sk-air-YOUR_API_KEY

POST /v1/chat/completions

Chat Completions yang kompatibel dengan OpenAI. Bekerja dengan SDK resmi openai SDK dengan mengganti base_url ke https://api.airforce/v1.

POSThttps://api.airforce/v1/chat/completions

Request body

Parameter	Type	Required	Description
model	string	Required	ID Model. Gunakan GET /v1/models untuk menemukan ID yang tersedia.
messages	array	Required	Riwayat percakapan. Setiap entri memiliki { role: "system" \| "user" \| "assistant" \| "tool", content }. Content berupa string atau array blok konten (vision, lihat di bawah).
max_tokens	integer	Optional	Jumlah maksimum token yang akan dihasilkan. Dibatasi pada max_output_tokens model.
temperature	float	Optional	Suhu pengambilan sampel, 0–2. Lebih rendah lebih deterministik. Defaultnya bergantung pada penyedia upstream.
top_p	float	Optional	Pengambilan sampel inti. Gunakan suhu atau top_p, jangan keduanya.
stream	boolean	Optional	Jika benar, responsnya adalah aliran Peristiwa yang Dikirim Server. Lihat "Streaming" di bawah.
models	array	Optional	Fallback models (max 3), e.g. ["deepseek-v3.2", "gpt-4o-mini"]. If every channel of the primary model fails, each candidate is tried in order. You are billed for — and response.model reports — the model that actually answered. Unknown or plan-gated candidates are skipped. With the OpenAI SDK pass it via extra_body.
transforms	array	Optional	Prompt transforms. Supported: ["middle-out"] — when the conversation overflows the model's context window, whole messages are dropped from the middle (system prompts, the first message and the most recent turns are kept), so long roleplay or agent histories keep working instead of erroring. Opt-in; off by default.
stream_options	object	Optional	{ include_usage: boolean }. Usage selalu disertakan pada chunk streaming terakhir; field ini diterima demi kompatibilitas OpenAI tetapi tidak dapat menonaktifkannya.
stop	string \| array	Optional	Hingga 4 urutan perhentian. Generasi berhenti segera setelah diproduksi.
tools	array	Optional	Definisi fungsi yang mungkin dipanggil oleh model. Lihat "Pemanggilan alat" di bawah.
tool_choice	string \| object	Optional	"auto" (default), "none", atau { type: "function", function: { name } } untuk memaksa panggilan tertentu.
response_format	object	Optional	{ type: "json_object" } memaksa model untuk memancarkan JSON yang valid. Diabaikan untuk model yang tidak mendukungnya.
reasoning_effort	string	Optional	Reasoning depth: "low" \| "medium" \| "high" \| "xhigh" \| "max". Any model with supports_reasoning: true (Claude, OpenAI o/GPT-5, Gemini, Qwen, DeepSeek, …). See "Reasoning & thinking".
thinking	string \| object	Optional	Cross-model thinking switch. "on" \| "off" \| "auto"; Anthropic-style { type: "enabled", budget_tokens: N }; hybrid { type: "enabled" \| "disabled" }. See "Reasoning & thinking".
thinking_budget	integer	Optional	Batas token untuk jejak penalaran model (ketika penyedia memaparkannya).
ignore_defaults	boolean	Optional	Lewati parameter default per model yang disimpan pengguna (dikonfigurasi di dasbor) untuk permintaan ini.
skill	string	Optional	ID of a single marketplace skill to apply to this request. The skill transforms your messages/parameters before the upstream call and overrides any installed-skill defaults. Consumed by Airforce, never forwarded upstream. See the Skills catalog at /docs/api/extend.
skills	array	Optional	Array of marketplace skill IDs applied in order, for stacking multiple skills on one request.

Contoh dasar

curl https://api.airforce/v1/chat/completions \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.1-chat",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is the capital of France?"}
    ],
    "max_tokens": 200,
    "temperature": 0.7
  }'

Bentuk respons

Parameter	Type	Required	Description
id	string	Optional	ID penyelesaian yang stabil, mis. "chatcmpl-abc123".
object	string	Optional	"chat.completion" untuk non-streaming, "chat.completion.chunk" untuk streaming.
created	integer	Optional	Stempel waktu Unix (detik).
model	string	Optional	Gema ID model yang diminta.
choices	array	Optional	Array kandidat penyelesaian: [{ index, message: { role, content, tool_calls? }, finish_reason }].
choices[].finish_reason	string	Optional	"stop" \| "length" \| "tool_calls" \| "content_filter".
usage	object	Optional	{ prompt_tokens, completion_tokens, total_tokens, completion_tokens_details?, prompt_tokens_details?, cache_creation_input_tokens?, cache_creation? }. completion_tokens_details.reasoning_tokens diatur ketika model menghasilkan jejak penalaran. Bidang cache muncul ketika upstream mengembalikan info prompt-caching: prompt_tokens_details.cached_tokens melaporkan pembacaan cache (standar OpenAI), cache_creation_input_tokens menggabungkan penulisan, dan cache_creation.ephemeral_5m_input_tokens / ephemeral_1h_input_tokens memberikan pembagian per TTL.

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1710000000,
  "model": "gpt-5.1-chat",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "The capital of France is Paris."
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 20,
    "completion_tokens": 8,
    "total_tokens": 28
  }
}

Penalaran & pemikiran

Reasoning/thinking is a cross-model feature for every model ID with supports_reasoning: true — Claude, OpenAI o-series/GPT-5, Gemini, Qwen, DeepSeek, and others. You send the same canonical parameters; Airforce maps them to each provider's native shape. This is not a DeepSeek-only API.

Truth source: check supports_reasoning: true pada model di GET /v1/models (or GET /api/models/{id}/allowed-params). Prefer that flag over guessing from the model name.

Model dengan dukungan penalaran

…· live

Parameter kanonik

Parameter	Type	Required	Description
reasoning_effort	string	Optional	"low" \| "medium" \| "high" \| "xhigh" \| "max". Accepted on every model with supports_reasoning: true. Some upstreams only honour a subset (e.g. high/max); others clamp unsupported levels to the nearest served value.
thinking	string \| object	Optional	Three accepted shapes (we normalise): "on" \| "off" \| "auto"; Anthropic-style { type: "enabled", budget_tokens: N }; hybrid { type: "enabled" \| "disabled" }. Mapped onto Claude extended thinking, OpenAI effort profiles, Gemini thinking_config, Qwen enable_thinking, DeepSeek hybrid, etc.
thinking_budget	integer	Optional	Maximum tokens the model may spend reasoning before emitting visible output. Mirrors budget_tokens when the upstream exposes a budget; takes precedence over reasoning_effort when both are sent and a budget is available.

What differs by family (mapping only)

Parameters are the same everywhere. Only how we map them (and how hard "off" is) differs:

Claude — Thinking on/off + budget; often also reasoning_effort via the gateway.
OpenAI (o1/o3, GPT-5) — Mainly reasoning_effort. A full "thinking off" is often not available — you control how strongly the model reasons, not always whether it reasons at all.
Gemini — thinking_config / budget mapped internally.
Qwen / Xiaomi / Alibaba — thinking + enable_thinking-style controls.
DeepSeek (generic) — Hybrid on/off is especially clear: thinking: { type: enabled|disabled } plus optional reasoning_effort.
Resellers / other — Often generic passthrough of the same canonical fields.

Controlling where the trace appears

An optional reasoning object on the request decides what happens to the thinking trace. It is consumed by Airforce and never forwarded upstream.

Parameter	Type	Required	Description
reasoning.format	string	Optional	"separate" (default) puts the trace in message.reasoning (and delta.reasoning while streaming). "inline" keeps the legacy inline <think>…</think> form inside content.
reasoning.exclude	boolean	Optional	When true, the reasoning trace is dropped entirely from the response. Reasoning tokens are still counted and billed if the model produced them.

"reasoning": { "format": "separate", "exclude": false }

Upaya penalaran (gaya OpenAI)

Primary control for o-series and GPT-5: how much the model may reason. Same canonical field as on every other supports_reasoning model — OpenAI is included, but behaviour is not 1:1 with DeepSeek's hard on/off.

curl https://api.airforce/v1/chat/completions \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "o3-mini",
    "messages": [{"role": "user", "content": "Prove the Pythagorean theorem."}],
    "reasoning_effort": "high"
  }'

Extended thinking (gaya Anthropic)

Budget-based thinking for Claude (and gateways that accept the Anthropic shape). You can still send reasoning_effort; we map when the channel supports it.

curl https://api.airforce/v1/chat/completions \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4.6",
    "messages": [{"role": "user", "content": "Plan a 7-day Italy trip."}],
    "thinking": {"type": "enabled", "budget_tokens": 4000}
  }'

Hybrid thinking (e.g. DeepSeek V3.2/V4)

Example of a hybrid model family with a clear Thinking / Non-Thinking switch — not a separate protocol. deepseek-v3.2, deepseek-v4-flash and deepseek-v4-pro accept the same canonical fields as every other supports_reasoning model. Toggle thinking and optionally set effort in one request:

curl https://api.airforce/v1/chat/completions \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-v4-pro",
    "messages": [{"role": "user", "content": "Solve this step by step: integrate x^2 * e^x."}],
    "thinking": {"type": "enabled"},
    "reasoning_effort": "high"
  }'

Turn thinking off (faster, cheaper when you only need the final answer) — this hard off is clearer on hybrid models than on many OpenAI o-series profiles:

"thinking": {"type": "disabled"}
// or simply: "thinking": "off"

Native docs for this family often list effort levels such as "high" and "max". We accept the full low…max scale and map unsupported levels to the nearest value that reaches the model. Prefer the hybrid IDs above over retired deepseek-chat / deepseek-reasoner names when you need an explicit on/off switch.

Jejak penalaran itu sendiri muncul di choices[0].message.reasoning (Bentuk OpenAI) atau sebagai thinking blok masuk content (Bentuk Anthropic). Token penalaran ditagih dan dilaporkan usage.completion_tokens_details.reasoning_tokens.

Rincian completion_tokens_details.reasoning_tokens itu hanya muncul ketika provider upstream melaporkannya. Pada respons yang di-stream, trace tiba di delta.reasoning_content per chunk.

Masukan visi & gambar

Model dengan supports_vision: true menerima gambar yang disematkan sebagai blok konten. URL publik atau URL data base64 berfungsi; batas ukuran bergantung pada model hulu.

Model dengan dukungan penglihatan

…· live

curl https://api.airforce/v1/chat/completions \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.1-chat",
    "messages": [{
      "role": "user",
      "content": [
        {"type": "text", "text": "What is in this image?"},
        {"type": "image_url", "image_url": {"url": "https://example.com/cat.jpg"}}
      ]
    }]
  }'

Panggilan alat

Model dengan supports_tools: true dapat memanggil fungsi yang Anda tetapkan. Model mengembalikan a tool_calls susunan; Anda menjalankan panggilan, lalu mengirimkan hasilnya kembali dalam a tool pesan.

Model dengan dukungan panggilan alat

…· live

Meminta

curl https://api.airforce/v1/chat/completions \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.1-chat",
    "messages": [{"role": "user", "content": "What is the weather in Paris?"}],
    "tools": [{
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get current weather for a location",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {"type": "string", "description": "City name"}
          },
          "required": ["location"]
        }
      }
    }],
    "tool_choice": "auto"
  }'

Respons dengan panggilan alat

{
  "id": "chatcmpl-abc123",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": null,
      "tool_calls": [{
        "id": "call_1",
        "type": "function",
        "function": {
          "name": "get_weather",
          "arguments": "{\"location\":\"Paris\"}"
        }
      }]
    },
    "finish_reason": "tool_calls"
  }]
}

Tindak lanjuti dengan hasil alat

{
  "model": "gpt-5.1-chat",
  "messages": [
    {"role": "user", "content": "What is the weather in Paris?"},
    {
      "role": "assistant",
      "content": null,
      "tool_calls": [{
        "id": "call_1",
        "type": "function",
        "function": {"name": "get_weather", "arguments": "{\"location\":\"Paris\"}"}
      }]
    },
    {"role": "tool", "tool_call_id": "call_1", "content": "{\"temp_c\": 14, \"sky\": \"cloudy\"}"}
  ]
}

Assistant prefill

End your messages array with an assistant message that already contains some text, and the model continues from it instead of starting a fresh turn. This is a reliable way to force a response to begin a specific way — a leading "{" for JSON, a chosen language, or a fixed prefix. The same trick works on /v1/messages. Providers that reject native prefill are handled automatically: the gateway retries once with a compatible rewrite, so you do not have to special-case them.

{
  "model": "claude-sonnet-4.6",
  "messages": [
    {"role": "user", "content": "List three primary colors as a JSON array."},
    {"role": "assistant", "content": "["}
  ]
}

Structured outputs

Set response_format to make the model return JSON. Two modes are supported:

{ "type": "json_object" } — the response is a single valid JSON value.
{ "type": "json_schema", "json_schema": { "name", "schema", "strict" } } — the model is steered to produce JSON that matches your JSON Schema.

curl https://api.airforce/v1/chat/completions \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.1-chat",
    "messages": [{"role": "user", "content": "Extract the city and country: I live in Paris, France."}],
    "response_format": {
      "type": "json_schema",
      "json_schema": {
        "name": "location",
        "schema": {
          "type": "object",
          "properties": { "city": {"type": "string"}, "country": {"type": "string"} },
          "required": ["city", "country"]
        }
      }
    }
  }'

Reliability: even when a model wraps its answer in prose or a markdown code fence, Airforce extracts the JSON payload so you always receive parseable content. If no valid JSON can be recovered, the original text is returned unchanged — so the guarantee never makes a response worse. This applies to non-streamed responses; streamed responses are passed through unchanged.

Streaming

Mengatur stream: true untuk menerima penyelesaian sebagian sebagai Acara yang Dikirim Server. Setiap peristiwa adalah satu potongan JSON dengan bentuk yang sama dengan respons non-streaming, kecuali message digantikan oleh delta. Aliran berakhir dengan data: [DONE].

curl https://api.airforce/v1/chat/completions \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.1-chat",
    "messages": [{"role": "user", "content": "Write a haiku about Berlin."}],
    "stream": true,
    "stream_options": {"include_usage": true}
  }'

Format kawat

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1710000000,"model":"gpt-5.1-chat","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1710000000,"model":"gpt-5.1-chat","choices":[{"index":0,"delta":{"content":"Cold "},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1710000000,"model":"gpt-5.1-chat","choices":[{"index":0,"delta":{"content":"stone "},"finish_reason":null}]}

…

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1710000000,"model":"gpt-5.1-chat","choices":[{"index":0,"delta":{},"finish_reason":"stop"}],"usage":{"prompt_tokens":12,"completion_tokens":17,"total_tokens":29}}

data: [DONE]

Reliability & smart routing

Every model ID resolves to a pool of upstream providers behind the scenes. If the first one errors or times out, the request is automatically retried against the next provider for the same model, in order, before any failure is returned — you do not configure or trigger this. The model field in the response always reports the variant that actually answered. This is independent of the optional models / fallbacks array, which adds your own cross-model candidates on top: first the primary model exhausts its own provider chain, then each fallback model exhausts its chain.

POST /v1/messages

API Messages yang kompatibel dengan Anthropic. Bekerja dengan SDK resmi @anthropic-ai/sdk dengan mengatur baseURL ke https://api.airforce. Meneruskan ke OpenAI/Google/dll. secara transparan untuk model non-Claude.

POSThttps://api.airforce/v1/messages

Request body

Parameter	Type	Required	Description
model	string	Required	ID Model (format Anthropic atau alias yang dirutekan).
messages	array	Required	Setiap entri: { role: "user" \| "assistant", content: string \| array }.
max_tokens	integer	Required	Dibutuhkan oleh Anthropic. Batas token untuk respons.
system	string \| array	Optional	System prompt. Berikan array blok { type: "text", text, cache_control? } untuk menandai segmen awalan yang di-cache. Lihat "Prompt caching".
temperature	float	Optional	0–1.
top_p	float	Optional	Pengambilan sampel inti.
top_k	integer	Optional	Batasi kumpulan pengambilan sampel hanya pada token K teratas.
stop_sequences	array	Optional	Hingga 4 urutan perhentian.
stream	boolean	Optional	Jika true, memancarkan stream event SSE bergaya Anthropic (lihat "Streaming").
fallbacks	array	Optional	Fallback models (max 3) in Anthropic form: [{"model": "gpt-4o-mini"}]. If every channel of the primary model fails, each candidate is tried in order; you are billed for — and the response model field reports — the model that actually answered. A plain models string array is accepted too.
tools	array	Optional	Definisi tool Anthropic: { name, description, input_schema }. Responsnya mungkin berisi blok konten tool_use.
tool_choice	object	Optional	{ type: "auto" \| "any" \| "tool", name? }.
thinking	object	Optional	Extended thinking Anthropic: { type: "enabled", budget_tokens: N }.

Contoh

curl https://api.airforce/v1/messages \
  -H "x-api-key: sk-air-YOUR_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4.6",
    "max_tokens": 256,
    "system": "You are a helpful assistant.",
    "messages": [
      {"role": "user", "content": "Hello, Claude!"}
    ]
  }'

Bentuk respons

Parameter	Type	Required	Description
id	string	Optional	ID Pesan, mis. "msg_01ABCxyz".
type	string	Optional	Selalu "message".
role	string	Optional	Selalu "assistant".
content	array	Optional	Array blok konten: { type: "text" \| "tool_use" \| "thinking", … }.
model	string	Optional	Gema model yang diminta.
stop_reason	string	Optional	"end_turn" \| "max_tokens" \| "stop_sequence" \| "tool_use".
usage	object	Optional	{ input_tokens, output_tokens, cache_read_input_tokens?, cache_creation_input_tokens?, cache_creation? }. Bidang cache muncul ketika prompt caching digunakan. cache_creation.ephemeral_5m_input_tokens dan ephemeral_1h_input_tokens memberikan pembagian penulisan per TTL.

Event streaming

SSE Anthropic menggunakan event bernama, bukan potongan JSON satu kali. Setiap event memiliki baik event: nama maupun sebuah data: muatan JSON.

event: message_start
data: {"type":"message_start","message":{"id":"msg_01","role":"assistant","content":[],"model":"claude-sonnet-4.6","stop_reason":null,"usage":{"input_tokens":12,"output_tokens":1}}}

event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"Hello"}}

event: content_block_stop
data: {"type":"content_block_stop","index":0}

event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"end_turn"},"usage":{"output_tokens":17}}

event: message_stop
data: {"type":"message_stop"}

POST /v1/messages/count_tokens

Anthropic-compatible token counting. Send the same system / messages / tools you would pass to /v1/messages and get an input-token estimate back without running the model — nothing is billed.

POSThttps://api.airforce/v1/messages/count_tokens

curl https://api.airforce/v1/messages/count_tokens \
  -H "x-api-key: sk-air-YOUR_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4.6",
    "system": "You are a helpful assistant.",
    "messages": [{"role": "user", "content": "Hello, Claude!"}]
  }'

# → {"input_tokens": 34}

The count is a fast character-based estimate (about 4 characters per token) over system, messages and tools — close enough for context-budget checks, not an exact tokenizer run.

Prompt caching

Pada /v1/messages dengan model Claude, tandai awalan sebagai cache dengan meneruskan system sebagai larik blok tempat segmen yang di-cache dibawa cache_control: { type: "ephemeral" }. Permintaan berikutnya yang dimulai dengan awalan yang sama membebankan tarif baca cache yang lebih murah. Model dengan supports_caching: true di dalam /v1/models mendukung ini.

Write vs read pricing

Cache writes are typically charged slightly above normal input (about 1.25× on Claude-family models). Cache reads are much cheaper (about 0.1× input). A large write with almost no later read is the expensive case — not a “cache discount”. Only reusing the same prefix turns the write into savings.

Tools like Claude Code often attach a large project context with cache markers on the first turns. Expect cache-write spend while the repo/system prefix is loaded; later turns only get cheap if that prefix is stable and reused. Subagents and multi-step agents can multiply large contexts across several requests.

Model dengan prompt caching

…· live

{
  "model": "claude-sonnet-4.6",
  "max_tokens": 1024,
  "system": [
    {"type": "text", "text": "You are a senior staff engineer at Airforce."},
    {
      "type": "text",
      "text": "<repository-snapshot>...</repository-snapshot>",
      "cache_control": {"type": "ephemeral"}
    }
  ],
  "messages": [
    {"role": "user", "content": "Where is rate limiting enforced?"}
  ]
}

Bagaimana hitungan cache dilaporkan dalam respons

Hitungan token cache diteruskan dalam bentuk asli setiap format, sehingga SDK (openai, @anthropic-ai/sdk, @google/genai) membacanya tanpa kode kustom. Bidang dihilangkan ketika nilainya nol, menjaga respons non-cached tetap ramping.

/v1/chat/completions (bentuk OpenAI)

"usage": {
  "prompt_tokens": 2104,
  "completion_tokens": 147,
  "total_tokens": 2251,
  "prompt_tokens_details": { "cached_tokens": 1980 },
  "cache_creation_input_tokens": 124,
  "cache_creation": {
    "ephemeral_5m_input_tokens": 124,
    "ephemeral_1h_input_tokens": 0
  }
}

/v1/messages (bentuk Anthropic)

"usage": {
  "input_tokens": 2104,
  "output_tokens": 147,
  "cache_read_input_tokens": 1980,
  "cache_creation_input_tokens": 124,
  "cache_creation": {
    "ephemeral_5m_input_tokens": 124,
    "ephemeral_1h_input_tokens": 0
  }
}

/v1beta/.../generateContent (bentuk Gemini)

"usageMetadata": {
  "promptTokenCount": 2104,
  "candidatesTokenCount": 147,
  "totalTokenCount": 2251,
  "cachedContentTokenCount": 1980
}

Di mana caching berlaku

Marker cache_control eksplisit dihormati di /v1/messages dan /v1/chat/completions untuk model Claude — pasang pada blok konten system atau message. Banyak penyedia lain (keluarga OpenAI, DeepSeek, Gemini) melakukan caching otomatis: Anda tidak mengirim marker dan cukup melihat cached_tokens di respons begitu prefix yang cukup panjang digunakan kembali.

Durasi cache: 5 menit atau 1 jam

Prefix yang di-cache bertahan 5 menit secara default dan timer disegarkan setiap kali kena. Untuk prefix yang bertahan lebih lama, tambahkan ttl: "1h" ke marker. Respons melaporkan setiap TTL secara terpisah di bawah cache_creation.

"cache_control": { "type": "ephemeral", "ttl": "1h" }

Contoh: tulis dulu, lalu baca

Kirim permintaan yang persis sama dua kali (contoh caching di atas). Panggilan pertama yang melihat prefix membayar satu kali cache write; panggilan identik dalam TTL membayar cache read yang jauh lebih murah.

Panggilan pertama — cache write (cuplikan usage):

"usage": {
  "input_tokens": 2104,
  "output_tokens": 12,
  "cache_creation_input_tokens": 1980,
  "cache_read_input_tokens": 0
}

Panggilan identik kedua dalam TTL — cache read:

"usage": {
  "input_tokens": 2104,
  "output_tokens": 12,
  "cache_creation_input_tokens": 0,
  "cache_read_input_tokens": 1980
}

Batas & biaya

Claude memerlukan prefix minimum yang dapat di-cache (sekitar 1024 token; lebih besar untuk beberapa model). Prefix yang lebih pendek tidak di-cache.
Hingga 4 cache breakpoint per permintaan, dan prefix yang di-cache harus identik byte-per-byte antar panggilan — bahkan perubahan satu karakter pun meleset dari cache.
Cache write lebih mahal daripada input biasa (5m ≈ 1,25×, 1h ≈ 2×); read jauh lebih murah (≈ 0,1×). Lihat harga cache tiap model di halaman harga.

POST /v1/responses

Permukaan OpenAI Responses-API untuk percakapan stateful. Autentikasi Bearer/x-api-key yang sama. Hitungan cache muncul sebagai input_tokens_details.cached_tokens (baca) ditambah cache_creation_input_tokens datar + cache_creation.ephemeral_* (tulis) untuk paritas dengan /v1/chat/completions.

POSThttps://api.airforce/v1/responses

POST /v1beta/models/{model}:generateContent

Google Gemini-compatible endpoint. Works with the official @google/genai SDK and the Gemini CLI by pointing the base URL at https://api.airforce/v1beta. Any routed model works — requests are translated to and from the native Gemini shape, and the model is taken from the URL path (not the body).

POSThttps://api.airforce/v1beta/models/{model}:generateContent

Authentication

Pass your Airforce API key any of the three ways Google clients use:

# 1) query parameter (Google default)
?key=sk-air-YOUR_API_KEY

# 2) header
x-goog-api-key: sk-air-YOUR_API_KEY

# 3) bearer token
Authorization: Bearer sk-air-YOUR_API_KEY

Request body

Parameter	Type	Required	Description
contents	array	Required	Conversation turns. Each: { role: "user" \| "model", parts: [...] }. A part is { text }, { functionCall: { name, args } }, or { functionResponse: { name, response } }. "model" is Gemini's term for the assistant role.
systemInstruction	object	Optional	System prompt: { parts: [{ text }] }.
generationConfig	object	Optional	{ temperature, maxOutputTokens, topP, stopSequences } — mapped to the canonical sampling parameters.
tools	array	Optional	Tool definitions: [{ functionDeclarations: [{ name, description, parameters }] }]. functionDeclarations are flattened across entries.
toolConfig	object	Optional	Tool-choice control: { functionCallingConfig: { mode: "AUTO" \| "ANY" \| "NONE" } }. ANY forces a call, NONE disables tools.

Example

curl "https://api.airforce/v1beta/models/gemini-3.1-pro:generateContent" \
  -H "x-goog-api-key: sk-air-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [
      {"role": "user", "parts": [{"text": "What is the capital of France?"}]}
    ],
    "systemInstruction": {"parts": [{"text": "You are a helpful assistant."}]},
    "generationConfig": {"temperature": 0.7, "maxOutputTokens": 256}
  }'

Response shape

Parameter	Type	Required	Description
candidates	array	Optional	Generated turns: [{ content: { role: "model", parts }, finishReason, index }]. Only the first candidate is populated.
candidates[].finishReason	string	Optional	"STOP" \| "MAX_TOKENS" \| "SAFETY" \| "OTHER".
usageMetadata	object	Optional	{ promptTokenCount, candidatesTokenCount, totalTokenCount, cachedContentTokenCount? }. cachedContentTokenCount appears when the upstream reported a cache read.
modelVersion	string	Optional	Echo of the requested model.

{
  "candidates": [{
    "content": {
      "role": "model",
      "parts": [{"text": "The capital of France is Paris."}]
    },
    "finishReason": "STOP",
    "index": 0
  }],
  "usageMetadata": {
    "promptTokenCount": 16,
    "candidatesTokenCount": 8,
    "totalTokenCount": 24
  },
  "modelVersion": "gemini-3.1-pro"
}

POST /v1beta/models/{model}:streamGenerateContent

Streaming uses the :streamGenerateContent action and returns Server-Sent Events. Each data: line is a full Gemini-shaped chunk (not a delta object); the final chunk carries usageMetadata.

data: {"candidates":[{"content":{"role":"model","parts":[{"text":"The capital"}]},"index":0}],"modelVersion":"gemini-3.1-pro"}

data: {"candidates":[{"content":{"role":"model","parts":[{"text":" is Paris."}]},"index":0}],"modelVersion":"gemini-3.1-pro"}

data: {"candidates":[{"content":{"role":"model","parts":[]},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":16,"candidatesTokenCount":8,"totalTokenCount":24}}

List models

The catalog is also exposed in Gemini Model-resource shape so Google clients can enumerate models.

curl https://api.airforce/v1beta/models

Notes: the base URL is https://api.airforce/v1beta (or /v1), not Google's host. The model name comes from the URL path, not the request body. Only the first candidate is returned, and a subset of Gemini fields is translated — safetySettings and cachedContent are currently ignored. Billing, rate limits and smart routing apply exactly as on /v1/chat/completions.

Kesalahan

Airforce mengembalikan kode status HTTP standar dan amplop kesalahan seragam untuk kedua titik akhir.

Parameter	Type	Required	Description
400	invalid_request_error	Optional	Format JSON salah, kolom wajib diisi tidak ada, model tidak dikenal.
401	invalid_request_error / auth_required	Optional	Kunci API tidak ada atau tidak valid.
402	insufficient_quota	Optional	Model memerlukan langganan aktif atau saldo Pay-as-you-Go yang positif.
403	model_access_denied / insufficient_scope	Optional	Izin paket atau per kunci menolak permintaan ini.
404	model_not_found	Optional	Model yang diminta tidak ada atau Anda tidak memiliki akses ke sana.
429	rate_limit_error	Optional	Tingkat permintaan atau batas token harian terlampaui.
503	api_error / moderation_unavailable	Optional	Semua kunci upstream untuk penyedia yang diminta gagal.

{
  "error": {
    "message": "The requested model does not exist or you do not have access to it.",
    "type": "model_not_found",
    "param": null,
    "code": "404"
  }
}

Slug deskriptif berada di type. code adalah status HTTP dalam bentuk string (mis. "404"), dan param bernilai null kecuali pada error validasi rentang parameter, di mana ia menyebutkan parameter yang bermasalah.

Temukan model

Lihat daftar lengkap ID model dan tanda kemampuannya (visi, alat, penalaran, cache, panjang konteks,…) di /docs/api/models.

curl https://api.airforce/v1/models \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY"