API REFERENCE

Sohbet tamamlamaları

Tek bir API’den 100’den fazla modelde sohbet yanıtları oluşturun. OpenAI Chat Completions, Anthropic Messages ve Anthropic Responses ile uyumludur.

Airforce, aynı model kümesi üzerinden hem OpenAI Chat Completions hem de Anthropic Messages wire formatlarını konuşur. Halihazırda kullandığınız SDK'yi seçin ve yalnızca base URL'i değiştirin — Claude dışı modeller her iki yüzeyin arkasında şeffaf bir şekilde iletilir.

Bu sayfa kimlik doğrulamayı, her iki yüzey için request ve response şekillerini, streaming'i, tool calling'i, vision'ı, akıl yürütmeyi ve prompt caching'i kapsar. Yeni mi başladınız? Aşağıdaki temel örnekle başlayın, bir çağrıyı çalıştırın, sonra çalışınca üzerine streaming, tool'lar veya caching ekleyin.

Kimlik doğrulama

Her isteğin bir Bearer belirtecine (Airforce API anahtarınız) ihtiyacı vardır. Anthropic x-api-key başlık da kabul edilir /v1/messages SDK uyumluluğu için.

Authorization: Bearer sk-air-YOUR_API_KEY
# alt for /v1/messages:
x-api-key: sk-air-YOUR_API_KEY

POST /v1/chat/completions

OpenAI uyumlu Sohbet Tamamlamaları. Yetkili ile çalışır openai Geçersiz kılarak SDK base_url ile https://api.airforce/v1.

POSThttps://api.airforce/v1/chat/completions

Talep gövdesi

Parameter	Type	Required	Description
model	string	Required	Model kimliği. Kullanılabilir kimlikleri keşfetmek için GET /v1/models komutunu kullanın.
messages	array	Required	Konuşma geçmişi. Her girişte { role: "system" \| "user" \| "assistant" \| "tool", content } bulunur. İçerik, bir string veya içerik blokları dizisidir (görüntü, aşağıya bakın).
max_tokens	integer	Optional	Oluşturulacak maksimum jeton sayısı. Modelin max_output_tokens değeriyle sınırlıdır.
temperature	float	Optional	Örnekleme sıcaklığı, 0–2. Düşük daha deterministiktir. Varsayılan, yukarı akış sağlayıcısına bağlıdır.
top_p	float	Optional	Çekirdek örneklemesi. Hem sıcaklık hem de top_p kullanın, ikisini birden kullanmayın.
stream	boolean	Optional	Doğru olduğunda yanıt, Sunucu Tarafından Gönderilen Olayların akışıdır. Aşağıdaki "Akış" konusuna bakın.
models	array	Optional	Fallback models (max 3), e.g. ["deepseek-v3.2", "gpt-4o-mini"]. If every channel of the primary model fails, each candidate is tried in order. You are billed for — and response.model reports — the model that actually answered. Unknown or plan-gated candidates are skipped. With the OpenAI SDK pass it via extra_body.
transforms	array	Optional	Prompt transforms. Supported: ["middle-out"] — when the conversation overflows the model's context window, whole messages are dropped from the middle (system prompts, the first message and the most recent turns are kept), so long roleplay or agent histories keep working instead of erroring. Opt-in; off by default.
stream_options	object	Optional	{ include_usage: boolean }. Kullanım her zaman son akış parçasına dahil edilir; bu alan OpenAI uyumluluğu için kabul edilir ancak kapatılamaz.
stop	string \| array	Optional	4'e kadar durdurma dizisi. Bir tanesi üretildiği anda nesil durur.
tools	array	Optional	Modelin çağırabileceği işlev tanımları. Aşağıdaki "Araç çağırma" konusuna bakın.
tool_choice	string \| object	Optional	Belirli bir çağrıyı zorlamak için "auto" (varsayılan), "none" veya { type: "function", function: { name } }.
response_format	object	Optional	{ type: "json_object" } modeli geçerli JSON yaymaya zorlar. Desteklemeyen modeller için yoksayılır.
reasoning_effort	string	Optional	Reasoning depth: "low" \| "medium" \| "high" \| "xhigh" \| "max". Any model with supports_reasoning: true (Claude, OpenAI o/GPT-5, Gemini, Qwen, DeepSeek, …). See "Reasoning & thinking".
thinking	string \| object	Optional	Cross-model thinking switch. "on" \| "off" \| "auto"; Anthropic-style { type: "enabled", budget_tokens: N }; hybrid { type: "enabled" \| "disabled" }. See "Reasoning & thinking".
thinking_budget	integer	Optional	Modelin mantık izlemesi için belirteç sınırı (sağlayıcı bir tane ortaya çıkardığında).
ignore_defaults	boolean	Optional	Bu istek için kullanıcının model başına kayıtlı varsayılan parametrelerini (kontrol panelinde yapılandırılmış) atlayın.
skill	string	Optional	ID of a single marketplace skill to apply to this request. The skill transforms your messages/parameters before the upstream call and overrides any installed-skill defaults. Consumed by Airforce, never forwarded upstream. See the Skills catalog at /docs/api/extend.
skills	array	Optional	Array of marketplace skill IDs applied in order, for stacking multiple skills on one request.

Temel örnek

curl https://api.airforce/v1/chat/completions \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.1-chat",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is the capital of France?"}
    ],
    "max_tokens": 200,
    "temperature": 0.7
  }'

Tepki şekli

Parameter	Type	Required	Description
id	string	Optional	Kararlı tamamlama kimliği, ör. "chatcmpl-abc123".
object	string	Optional	Akışlı olmayanlar için "chat.completion", akışlı olanlar için "chat.completion.chunk".
created	integer	Optional	Unix zaman damgası (saniye).
model	string	Optional	İstenen model kimliğinin yankısı.
choices	array	Optional	Tamamlama adayları dizisi: [{ index, message: { role, content, tool_calls? }, bitiş_nedeni }].
choices[].finish_reason	string	Optional	"stop" \| "length" \| "tool_calls" \| "content_filter".
usage	object	Optional	{ prompt_tokens, completion_tokens, total_tokens, completion_tokens_details?, prompt_tokens_details?, cache_creation_input_tokens?, cache_creation? }. completion_tokens_details.reasoning_tokens model bir akıl yürütme izi ürettiğinde ayarlanır. Cache alanları upstream prompt-caching bilgisi döndürdüğünde görünür: prompt_tokens_details.cached_tokens cache okumalarını raporlar (OpenAI standardı), cache_creation_input_tokens yazmaları toplar, ve cache_creation.ephemeral_5m_input_tokens / ephemeral_1h_input_tokens TTL bazlı dağılımı verir.

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1710000000,
  "model": "gpt-5.1-chat",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "The capital of France is Paris."
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 20,
    "completion_tokens": 8,
    "total_tokens": 28
  }
}

Muhakeme ve düşünme

Reasoning/thinking is a cross-model feature for every model ID with supports_reasoning: true — Claude, OpenAI o-series/GPT-5, Gemini, Qwen, DeepSeek, and others. You send the same canonical parameters; Airforce maps them to each provider's native shape. This is not a DeepSeek-only API.

Truth source: check supports_reasoning: true bir model üzerinde GET /v1/models (or GET /api/models/{id}/allowed-params). Prefer that flag over guessing from the model name.

Muhakeme desteğine sahip modeller

…· live

Kanonik parametreler

Parameter	Type	Required	Description
reasoning_effort	string	Optional	"low" \| "medium" \| "high" \| "xhigh" \| "max". Accepted on every model with supports_reasoning: true. Some upstreams only honour a subset (e.g. high/max); others clamp unsupported levels to the nearest served value.
thinking	string \| object	Optional	Three accepted shapes (we normalise): "on" \| "off" \| "auto"; Anthropic-style { type: "enabled", budget_tokens: N }; hybrid { type: "enabled" \| "disabled" }. Mapped onto Claude extended thinking, OpenAI effort profiles, Gemini thinking_config, Qwen enable_thinking, DeepSeek hybrid, etc.
thinking_budget	integer	Optional	Maximum tokens the model may spend reasoning before emitting visible output. Mirrors budget_tokens when the upstream exposes a budget; takes precedence over reasoning_effort when both are sent and a budget is available.

What differs by family (mapping only)

Parameters are the same everywhere. Only how we map them (and how hard "off" is) differs:

Claude — Thinking on/off + budget; often also reasoning_effort via the gateway.
OpenAI (o1/o3, GPT-5) — Mainly reasoning_effort. A full "thinking off" is often not available — you control how strongly the model reasons, not always whether it reasons at all.
Gemini — thinking_config / budget mapped internally.
Qwen / Xiaomi / Alibaba — thinking + enable_thinking-style controls.
DeepSeek (generic) — Hybrid on/off is especially clear: thinking: { type: enabled|disabled } plus optional reasoning_effort.
Resellers / other — Often generic passthrough of the same canonical fields.

Controlling where the trace appears

An optional reasoning object on the request decides what happens to the thinking trace. It is consumed by Airforce and never forwarded upstream.

Parameter	Type	Required	Description
reasoning.format	string	Optional	"separate" (default) puts the trace in message.reasoning (and delta.reasoning while streaming). "inline" keeps the legacy inline <think>…</think> form inside content.
reasoning.exclude	boolean	Optional	When true, the reasoning trace is dropped entirely from the response. Reasoning tokens are still counted and billed if the model produced them.

"reasoning": { "format": "separate", "exclude": false }

Muhakeme çabası (OpenAI tarzı)

Primary control for o-series and GPT-5: how much the model may reason. Same canonical field as on every other supports_reasoning model — OpenAI is included, but behaviour is not 1:1 with DeepSeek's hard on/off.

curl https://api.airforce/v1/chat/completions \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "o3-mini",
    "messages": [{"role": "user", "content": "Prove the Pythagorean theorem."}],
    "reasoning_effort": "high"
  }'

Genişletilmiş düşünme (Anthropic tarzı)

Budget-based thinking for Claude (and gateways that accept the Anthropic shape). You can still send reasoning_effort; we map when the channel supports it.

curl https://api.airforce/v1/chat/completions \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4.6",
    "messages": [{"role": "user", "content": "Plan a 7-day Italy trip."}],
    "thinking": {"type": "enabled", "budget_tokens": 4000}
  }'

Hybrid thinking (e.g. DeepSeek V3.2/V4)

Example of a hybrid model family with a clear Thinking / Non-Thinking switch — not a separate protocol. deepseek-v3.2, deepseek-v4-flash and deepseek-v4-pro accept the same canonical fields as every other supports_reasoning model. Toggle thinking and optionally set effort in one request:

curl https://api.airforce/v1/chat/completions \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-v4-pro",
    "messages": [{"role": "user", "content": "Solve this step by step: integrate x^2 * e^x."}],
    "thinking": {"type": "enabled"},
    "reasoning_effort": "high"
  }'

Turn thinking off (faster, cheaper when you only need the final answer) — this hard off is clearer on hybrid models than on many OpenAI o-series profiles:

"thinking": {"type": "disabled"}
// or simply: "thinking": "off"

Native docs for this family often list effort levels such as "high" and "max". We accept the full low…max scale and map unsupported levels to the nearest value that reaches the model. Prefer the hybrid IDs above over retired deepseek-chat / deepseek-reasoner names when you need an explicit on/off switch.

Muhakeme izinin kendisi şu şekilde görünür: choices[0].message.reasoning (OpenAI şekli) veya thinking bloke etmek content (Anthropic biçimi). Muhakeme belirteçleri faturalandırılır ve raporlanır usage.completion_tokens_details.reasoning_tokens.

Bu completion_tokens_details.reasoning_tokens dökümü yalnızca üst akış sağlayıcısı bunu raporladığında mevcuttur. Stream edilen bir yanıtta iz, chunk başına delta.reasoning_content üzerinde gelir.

Görüş ve görüntü girişi

Şunlara sahip modeller: supports_vision: true İçerik blokları olarak gömülü görselleri kabul edin. Ya genel bir URL ya da bir base64 veri URL'si çalışır; boyut sınırları yukarı akış modeline bağlıdır.

Görüş desteğine sahip modeller

…· live

curl https://api.airforce/v1/chat/completions \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.1-chat",
    "messages": [{
      "role": "user",
      "content": [
        {"type": "text", "text": "What is in this image?"},
        {"type": "image_url", "image_url": {"url": "https://example.com/cat.jpg"}}
      ]
    }]
  }'

Araç çağırma

Şunlara sahip modeller: supports_tools: true tanımladığınız işlevleri çağırabilir. Model bir döndürür tool_calls sıralamak; aramayı çalıştırırsınız ve ardından sonucu geri gönderirsiniz. tool mesaj.

Araç çağırma desteğine sahip modeller

…· live

Rica etmek

curl https://api.airforce/v1/chat/completions \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.1-chat",
    "messages": [{"role": "user", "content": "What is the weather in Paris?"}],
    "tools": [{
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get current weather for a location",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {"type": "string", "description": "City name"}
          },
          "required": ["location"]
        }
      }
    }],
    "tool_choice": "auto"
  }'

Araç çağrısıyla yanıt

{
  "id": "chatcmpl-abc123",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": null,
      "tool_calls": [{
        "id": "call_1",
        "type": "function",
        "function": {
          "name": "get_weather",
          "arguments": "{\"location\":\"Paris\"}"
        }
      }]
    },
    "finish_reason": "tool_calls"
  }]
}

Araç sonucunun takibi

{
  "model": "gpt-5.1-chat",
  "messages": [
    {"role": "user", "content": "What is the weather in Paris?"},
    {
      "role": "assistant",
      "content": null,
      "tool_calls": [{
        "id": "call_1",
        "type": "function",
        "function": {"name": "get_weather", "arguments": "{\"location\":\"Paris\"}"}
      }]
    },
    {"role": "tool", "tool_call_id": "call_1", "content": "{\"temp_c\": 14, \"sky\": \"cloudy\"}"}
  ]
}

Assistant prefill

End your messages array with an assistant message that already contains some text, and the model continues from it instead of starting a fresh turn. This is a reliable way to force a response to begin a specific way — a leading "{" for JSON, a chosen language, or a fixed prefix. The same trick works on /v1/messages. Providers that reject native prefill are handled automatically: the gateway retries once with a compatible rewrite, so you do not have to special-case them.

{
  "model": "claude-sonnet-4.6",
  "messages": [
    {"role": "user", "content": "List three primary colors as a JSON array."},
    {"role": "assistant", "content": "["}
  ]
}

Structured outputs

Set response_format to make the model return JSON. Two modes are supported:

{ "type": "json_object" } — the response is a single valid JSON value.
{ "type": "json_schema", "json_schema": { "name", "schema", "strict" } } — the model is steered to produce JSON that matches your JSON Schema.

curl https://api.airforce/v1/chat/completions \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.1-chat",
    "messages": [{"role": "user", "content": "Extract the city and country: I live in Paris, France."}],
    "response_format": {
      "type": "json_schema",
      "json_schema": {
        "name": "location",
        "schema": {
          "type": "object",
          "properties": { "city": {"type": "string"}, "country": {"type": "string"} },
          "required": ["city", "country"]
        }
      }
    }
  }'

Reliability: even when a model wraps its answer in prose or a markdown code fence, Airforce extracts the JSON payload so you always receive parseable content. If no valid JSON can be recovered, the original text is returned unchanged — so the guarantee never makes a response worse. This applies to non-streamed responses; streamed responses are passed through unchanged.

Akış

Ayarlamak stream: true Kısmi tamamlamaları Sunucu Tarafından Gönderilen Olaylar olarak almak için. Her olay, akışsız yanıtla aynı şekle sahip bir JSON öbeğidir; ancak message şununla değiştirilir: delta. Akış şununla bitiyor: data: [DONE].

curl https://api.airforce/v1/chat/completions \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.1-chat",
    "messages": [{"role": "user", "content": "Write a haiku about Berlin."}],
    "stream": true,
    "stream_options": {"include_usage": true}
  }'

Tel formatı

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1710000000,"model":"gpt-5.1-chat","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1710000000,"model":"gpt-5.1-chat","choices":[{"index":0,"delta":{"content":"Cold "},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1710000000,"model":"gpt-5.1-chat","choices":[{"index":0,"delta":{"content":"stone "},"finish_reason":null}]}

…

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1710000000,"model":"gpt-5.1-chat","choices":[{"index":0,"delta":{},"finish_reason":"stop"}],"usage":{"prompt_tokens":12,"completion_tokens":17,"total_tokens":29}}

data: [DONE]

Reliability & smart routing

Every model ID resolves to a pool of upstream providers behind the scenes. If the first one errors or times out, the request is automatically retried against the next provider for the same model, in order, before any failure is returned — you do not configure or trigger this. The model field in the response always reports the variant that actually answered. This is independent of the optional models / fallbacks array, which adds your own cross-model candidates on top: first the primary model exhausts its own provider chain, then each fallback model exhausts its chain.

POST /v1/messages

Anthropic uyumlu Mesajlar API'si. Resmi @anthropic-ai/sdk ayarlayarak baseURL ile https://api.airforce. OpenAI/Google/etc'ye iletir. Claude olmayan modeller için şeffaf bir şekilde.

POSThttps://api.airforce/v1/messages

Talep gövdesi

Parameter	Type	Required	Description
model	string	Required	Model Kimliği (Anthropic formatında veya yönlendirilmiş takma ad).
messages	array	Required	Her giriş: { role: "user" \| "assistant", content: string \| array }.
max_tokens	integer	Required	Anthropic tarafından gerekli. Yanıt için belirteç sınırı.
system	string \| array	Optional	Sistem istemi. Önbelleğe alınmış önek bölümlerini işaretlemek için { type: "text", text, cache_control? } bloklarından oluşan bir dizi iletin. Bkz. "İstemi önbelleğe alma".
temperature	float	Optional	0–1.
top_p	float	Optional	Çekirdek örneklemesi.
top_k	integer	Optional	Örnekleme havuzunu en iyi K jetonlarıyla sınırlayın.
stop_sequences	array	Optional	4'e kadar durdurma dizisi.
stream	boolean	Optional	true olduğunda, Anthropic tarzı SSE olay akışını yayar (bkz. "Akış").
fallbacks	array	Optional	Fallback models (max 3) in Anthropic form: [{"model": "gpt-4o-mini"}]. If every channel of the primary model fails, each candidate is tried in order; you are billed for — and the response model field reports — the model that actually answered. A plain models string array is accepted too.
tools	array	Optional	Anthropic araç tanımları: { name, description, input_schema }. Yanıt tool_use içerik blokları içerebilir.
tool_choice	object	Optional	{ type: "auto" \| "any" \| "tool", name? }.
thinking	object	Optional	Anthropic genişletilmiş düşünme: { type: "enabled", budget_tokens: N }.

Örnek

curl https://api.airforce/v1/messages \
  -H "x-api-key: sk-air-YOUR_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4.6",
    "max_tokens": 256,
    "system": "You are a helpful assistant.",
    "messages": [
      {"role": "user", "content": "Hello, Claude!"}
    ]
  }'

Tepki şekli

Parameter	Type	Required	Description
id	string	Optional	Mesaj kimliği, ör. "msg_01ABCxyz".
type	string	Optional	Her zaman "message".
role	string	Optional	Her zaman "assistant".
content	array	Optional	İçerik blokları dizisi: { type: "text" \| "tool_use" \| "thinking", … }.
model	string	Optional	İstenilen modelin yankısı.
stop_reason	string	Optional	"end_turn" \| "max_tokens" \| "stop_sequence" \| "tool_use".
usage	object	Optional	{ input_tokens, output_tokens, cache_read_input_tokens?, cache_creation_input_tokens?, cache_creation? }. Cache alanları prompt caching kullanıldığında görünür. cache_creation.ephemeral_5m_input_tokens ve ephemeral_1h_input_tokens TTL bazlı yazma dağılımını verir.

Etkinliklerin akışı

Anthropic SSE, tek seferlik JSON parçaları yerine adlandırılmış olayları kullanır. Her olayın hem bir event: isim ve bir data: JSON yükü.

event: message_start
data: {"type":"message_start","message":{"id":"msg_01","role":"assistant","content":[],"model":"claude-sonnet-4.6","stop_reason":null,"usage":{"input_tokens":12,"output_tokens":1}}}

event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"Hello"}}

event: content_block_stop
data: {"type":"content_block_stop","index":0}

event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"end_turn"},"usage":{"output_tokens":17}}

event: message_stop
data: {"type":"message_stop"}

POST /v1/messages/count_tokens

Anthropic-compatible token counting. Send the same system / messages / tools you would pass to /v1/messages and get an input-token estimate back without running the model — nothing is billed.

POSThttps://api.airforce/v1/messages/count_tokens

curl https://api.airforce/v1/messages/count_tokens \
  -H "x-api-key: sk-air-YOUR_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4.6",
    "system": "You are a helpful assistant.",
    "messages": [{"role": "user", "content": "Hello, Claude!"}]
  }'

# → {"input_tokens": 34}

The count is a fast character-based estimate (about 4 characters per token) over system, messages and tools — close enough for context-budget checks, not an exact tokenizer run.

İstemi önbelleğe alma

Açık /v1/messages Claude modellerinde, bir öneki ileterek önbelleğe alınmış olarak işaretleyin system önbelleğe alınmış segmentin taşındığı bir blok dizisi olarak cache_control: { type: "ephemeral" }. Aynı önekle başlayan sonraki istekler, daha ucuz önbellek okuma hızından ücret alır. Şunlara sahip modeller: supports_caching: true içinde /v1/models bunu destekleyin.

Write vs read pricing

Cache writes are typically charged slightly above normal input (about 1.25× on Claude-family models). Cache reads are much cheaper (about 0.1× input). A large write with almost no later read is the expensive case — not a “cache discount”. Only reusing the same prefix turns the write into savings.

Tools like Claude Code often attach a large project context with cache markers on the first turns. Expect cache-write spend while the repo/system prefix is loaded; later turns only get cheap if that prefix is stable and reused. Subagents and multi-step agents can multiply large contexts across several requests.

Hızlı önbelleğe alma özelliğine sahip modeller

…· live

{
  "model": "claude-sonnet-4.6",
  "max_tokens": 1024,
  "system": [
    {"type": "text", "text": "You are a senior staff engineer at Airforce."},
    {
      "type": "text",
      "text": "<repository-snapshot>...</repository-snapshot>",
      "cache_control": {"type": "ephemeral"}
    }
  ],
  "messages": [
    {"role": "user", "content": "Where is rate limiting enforced?"}
  ]
}

Cache sayıları yanıtta nasıl raporlanır

Cache token sayıları her formatın yerel biçiminde geçirilir, böylece SDK'lar (openai, @anthropic-ai/sdk, @google/genai) bunları özel kod olmadan okur. Değer sıfır olduğunda alanlar atlanır, cache'lenmemiş yanıtları yalın tutar.

/v1/chat/completions (OpenAI şekli)

"usage": {
  "prompt_tokens": 2104,
  "completion_tokens": 147,
  "total_tokens": 2251,
  "prompt_tokens_details": { "cached_tokens": 1980 },
  "cache_creation_input_tokens": 124,
  "cache_creation": {
    "ephemeral_5m_input_tokens": 124,
    "ephemeral_1h_input_tokens": 0
  }
}

/v1/messages (Anthropic şekli)

"usage": {
  "input_tokens": 2104,
  "output_tokens": 147,
  "cache_read_input_tokens": 1980,
  "cache_creation_input_tokens": 124,
  "cache_creation": {
    "ephemeral_5m_input_tokens": 124,
    "ephemeral_1h_input_tokens": 0
  }
}

/v1beta/.../generateContent (Gemini şekli)

"usageMetadata": {
  "promptTokenCount": 2104,
  "candidatesTokenCount": 147,
  "totalTokenCount": 2251,
  "cachedContentTokenCount": 1980
}

Önbellekleme nerede geçerli

Açık cache_control işaretleri Claude modelleri için /v1/messages ve /v1/chat/completions üzerinde dikkate alınır — bunları system veya message içerik bloklarına koyun. Diğer birçok sağlayıcı (OpenAI ailesi, DeepSeek, Gemini) otomatik önbellekler: işaret göndermezsiniz ve yeterince uzun bir önek yeniden kullanıldığında yanıtta yalnızca cached_tokens görürsünüz.

Önbellek süresi: 5 dakika veya 1 saat

Önbelleğe alınmış bir önek varsayılan olarak 5 dakika yaşar ve her isabet sayacı yeniler. Daha uzun yaşayan bir önek için işarete ttl: "1h" ekleyin. Yanıt her TTL’yi cache_creation altında ayrı ayrı bildirir.

"cache_control": { "type": "ephemeral", "ttl": "1h" }

Örnek: önce yazma, sonra okuma

Tam olarak aynı isteği iki kez gönderin (yukarıdaki önbellek örneği). Öneki ilk gören çağrı tek seferlik bir önbellek yazması öder; TTL içindeki özdeş çağrılar çok daha ucuz önbellek okumasını öder.

İlk çağrı — önbellek yazma (usage alıntısı):

"usage": {
  "input_tokens": 2104,
  "output_tokens": 12,
  "cache_creation_input_tokens": 1980,
  "cache_read_input_tokens": 0
}

TTL içinde ikinci özdeş çağrı — önbellek okuma:

"usage": {
  "input_tokens": 2104,
  "output_tokens": 12,
  "cache_creation_input_tokens": 0,
  "cache_read_input_tokens": 1980
}

Sınırlar ve maliyet

Claude, önbelleğe alınabilir minimum bir önek gerektirir (yaklaşık 1024 token; bazı modellerde daha fazla). Daha kısa önekler basitçe önbelleğe alınmaz.
İstek başına en fazla 4 önbellek kırılma noktası ve önbelleğe alınan önek çağrılar arasında bayt bayt aynı olmalıdır — tek karakterlik bir değişiklik bile önbelleği ıskalar.
Önbellek yazmaları normal girdiden daha pahalıdır (5m ≈ 1,25×, 1h ≈ 2×); okumalar çok daha ucuzdur (≈ 0,1×). Her modelin önbellek fiyatlarını fiyatlandırma sayfasında görün.

POST /v1/responses

Durum bilgili konuşmalar için OpenAI Responses-API yüzeyi. Aynı Bearer/x-api-key kimlik doğrulaması. Cache sayıları input_tokens_details.cached_tokens (okuma) artı düz cache_creation_input_tokens + cache_creation.ephemeral_* (yazmalar) olarak görünür, /v1/chat/completions ile eşitlik için.

POSThttps://api.airforce/v1/responses

POST /v1beta/models/{model}:generateContent

Google Gemini-compatible endpoint. Works with the official @google/genai SDK and the Gemini CLI by pointing the base URL at https://api.airforce/v1beta. Any routed model works — requests are translated to and from the native Gemini shape, and the model is taken from the URL path (not the body).

POSThttps://api.airforce/v1beta/models/{model}:generateContent

Authentication

Pass your Airforce API key any of the three ways Google clients use:

# 1) query parameter (Google default)
?key=sk-air-YOUR_API_KEY

# 2) header
x-goog-api-key: sk-air-YOUR_API_KEY

# 3) bearer token
Authorization: Bearer sk-air-YOUR_API_KEY

Request body

Parameter	Type	Required	Description
contents	array	Required	Conversation turns. Each: { role: "user" \| "model", parts: [...] }. A part is { text }, { functionCall: { name, args } }, or { functionResponse: { name, response } }. "model" is Gemini's term for the assistant role.
systemInstruction	object	Optional	System prompt: { parts: [{ text }] }.
generationConfig	object	Optional	{ temperature, maxOutputTokens, topP, stopSequences } — mapped to the canonical sampling parameters.
tools	array	Optional	Tool definitions: [{ functionDeclarations: [{ name, description, parameters }] }]. functionDeclarations are flattened across entries.
toolConfig	object	Optional	Tool-choice control: { functionCallingConfig: { mode: "AUTO" \| "ANY" \| "NONE" } }. ANY forces a call, NONE disables tools.

Example

curl "https://api.airforce/v1beta/models/gemini-3.1-pro:generateContent" \
  -H "x-goog-api-key: sk-air-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [
      {"role": "user", "parts": [{"text": "What is the capital of France?"}]}
    ],
    "systemInstruction": {"parts": [{"text": "You are a helpful assistant."}]},
    "generationConfig": {"temperature": 0.7, "maxOutputTokens": 256}
  }'

Response shape

Parameter	Type	Required	Description
candidates	array	Optional	Generated turns: [{ content: { role: "model", parts }, finishReason, index }]. Only the first candidate is populated.
candidates[].finishReason	string	Optional	"STOP" \| "MAX_TOKENS" \| "SAFETY" \| "OTHER".
usageMetadata	object	Optional	{ promptTokenCount, candidatesTokenCount, totalTokenCount, cachedContentTokenCount? }. cachedContentTokenCount appears when the upstream reported a cache read.
modelVersion	string	Optional	Echo of the requested model.

{
  "candidates": [{
    "content": {
      "role": "model",
      "parts": [{"text": "The capital of France is Paris."}]
    },
    "finishReason": "STOP",
    "index": 0
  }],
  "usageMetadata": {
    "promptTokenCount": 16,
    "candidatesTokenCount": 8,
    "totalTokenCount": 24
  },
  "modelVersion": "gemini-3.1-pro"
}

POST /v1beta/models/{model}:streamGenerateContent

Streaming uses the :streamGenerateContent action and returns Server-Sent Events. Each data: line is a full Gemini-shaped chunk (not a delta object); the final chunk carries usageMetadata.

data: {"candidates":[{"content":{"role":"model","parts":[{"text":"The capital"}]},"index":0}],"modelVersion":"gemini-3.1-pro"}

data: {"candidates":[{"content":{"role":"model","parts":[{"text":" is Paris."}]},"index":0}],"modelVersion":"gemini-3.1-pro"}

data: {"candidates":[{"content":{"role":"model","parts":[]},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":16,"candidatesTokenCount":8,"totalTokenCount":24}}

List models

The catalog is also exposed in Gemini Model-resource shape so Google clients can enumerate models.

curl https://api.airforce/v1beta/models

Notes: the base URL is https://api.airforce/v1beta (or /v1), not Google's host. The model name comes from the URL path, not the request body. Only the first candidate is returned, and a subset of Gemini fields is translated — safetySettings and cachedContent are currently ignored. Billing, rate limits and smart routing apply exactly as on /v1/chat/completions.

Hatalar

Airforce, her iki uç nokta için standart HTTP durum kodlarını ve tek tip bir hata zarfını döndürür.

Parameter	Type	Required	Description
400	invalid_request_error	Optional	Hatalı biçimlendirilmiş JSON, gerekli alan eksik, bilinmeyen model.
401	invalid_request_error / auth_required	Optional	Eksik veya geçersiz API anahtarı.
402	insufficient_quota	Optional	Model, etkin bir abonelik veya pozitif bir Pay-as-you-Go bakiyesi gerektirir.
403	model_access_denied / insufficient_scope	Optional	Plan veya anahtar başına izinler bu isteği reddediyor.
404	model_not_found	Optional	İstenen model mevcut değil veya ona erişiminiz yok.
429	rate_limit_error	Optional	Talep oranı veya günlük jeton sınırı aşıldı.
503	api_error / moderation_unavailable	Optional	İstenen sağlayıcının tüm yukarı akış anahtarları başarısız oldu.

{
  "error": {
    "message": "The requested model does not exist or you do not have access to it.",
    "type": "model_not_found",
    "param": null,
    "code": "404"
  }
}

Açıklayıcı slug, type içindedir. code, HTTP durumunun string biçimidir (örn. "404") ve param, parametre aralığı doğrulama hataları dışında null'dur; bu hatalarda sorun çıkaran parametreyi adlandırır.

Modelleri keşfedin

Model kimliklerinin ve yetenek işaretlerinin (vizyon, araçlar, akıl yürütme, önbellekleme, bağlam uzunluğu,…) tam listesine şu adresten bakın: /docs/api/models.

curl https://api.airforce/v1/models \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY"