API REFERENCE

استكمالات الدردشة

أنشئ استجابات للدردشة عبر أكثر من 100 نموذج من واجهة برمجة تطبيقات واحدة. تسجيل دخول متوافق مع إكمالات الدردشة في OpenAI ورسائل Anthropic واستجابات Anthropic.

يتحدث Airforce صيغتي التواصل الخاصتين بـ OpenAI Chat Completions و Anthropic Messages عبر المجموعة نفسها من الموديلات. اختر الـ SDK الذي تستخدمه بالفعل وغيّر فقط الـ base URL — تُمرَّر الموديلات غير Claude بشفافية خلف أي من الواجهتين.

تغطي هذه الصفحة المصادقة، وأشكال الطلب والاستجابة لكلتا الواجهتين، والـ streaming، والـ tool calling، والـ vision، والـ reasoning، والـ prompt caching. جديد هنا؟ ابدأ بالمثال الأساسي أدناه، واجعل استدعاءً واحداً يعمل، ثم أضف الـ streaming أو الـ tools أو الـ caching بعد ذلك.

المصادقة

يحتاج كل طلب إلى رمز حامل (مفتاح Airforce API الخاص بك). Anthropic x-api-key يتم قبول الرأس أيضًا /v1/messages للتوافق مع SDK.

Authorization: Bearer sk-air-YOUR_API_KEY
# alt for /v1/messages:
x-api-key: sk-air-YOUR_API_KEY

POST /v1/chat/completions

عمليات إكمال الدردشة المتوافقة مع OpenAI. يعمل مع المسؤول openai SDK عن طريق التجاوز base_url ل https://api.airforce/v1.

POSThttps://api.airforce/v1/chat/completions

هيئة الطلب

Parameter	Type	Required	Description
model	string	Required	معرف النموذج. استخدم GET /v1/models لاكتشاف المعرفات المتاحة.
messages	array	Required	سجلّ المحادثة. كل إدخال يحتوي على { role: "system" \| "user" \| "assistant" \| "tool", content }. المحتوى عبارة عن سلسلة نصية أو مصفوفة من كتل المحتوى (الرؤية، انظر أدناه).
max_tokens	integer	Optional	الحد الأقصى لعدد الرموز التي سيتم إنشاؤها. الحد الأقصى لـ max_output_tokens الخاص بالنموذج.
temperature	float	Optional	درجة حرارة أخذ العينات، 0-2. أقل هو أكثر حتمية. الافتراضي يعتمد على الموفر الرئيسي.
top_p	float	Optional	أخذ عينات النواة. استخدم إما درجة الحرارة أو top_p، وليس كليهما.
stream	boolean	Optional	عندما يكون هذا صحيحًا، تكون الاستجابة عبارة عن دفق من الأحداث المرسلة من الخادم. انظر "البث" أدناه.
models	array	Optional	Fallback models (max 3), e.g. ["deepseek-v3.2", "gpt-4o-mini"]. If every channel of the primary model fails, each candidate is tried in order. You are billed for — and response.model reports — the model that actually answered. Unknown or plan-gated candidates are skipped. With the OpenAI SDK pass it via extra_body.
transforms	array	Optional	Prompt transforms. Supported: ["middle-out"] — when the conversation overflows the model's context window, whole messages are dropped from the middle (system prompts, the first message and the most recent turns are kept), so long roleplay or agent histories keep working instead of erroring. Opt-in; off by default.
stream_options	object	Optional	{ include_usage: boolean }. يُضمَّن الاستخدام دائمًا في آخر جزء من البث؛ يُقبل هذا الحقل من أجل التوافق مع OpenAI لكن لا يمكن إيقافه.
stop	string \| array	Optional	ما يصل إلى 4 تسلسلات توقف. يتوقف الجيل بمجرد إنتاجه.
tools	array	Optional	تعريفات الوظيفة التي قد يستدعيها النموذج. راجع "استدعاء الأداة" أدناه.
tool_choice	string \| object	Optional	"auto" (افتراضي)، "none"، أو { type: "function", function: { name } } لفرض استدعاء محدّد.
response_format	object	Optional	{ type: "json_object" } يفرض على النموذج إصدار JSON صالحًا. تم تجاهله بالنسبة للنماذج التي لا تدعمه.
reasoning_effort	string	Optional	Reasoning depth: "low" \| "medium" \| "high" \| "xhigh" \| "max". Any model with supports_reasoning: true (Claude, OpenAI o/GPT-5, Gemini, Qwen, DeepSeek, …). See "Reasoning & thinking".
thinking	string \| object	Optional	Cross-model thinking switch. "on" \| "off" \| "auto"; Anthropic-style { type: "enabled", budget_tokens: N }; hybrid { type: "enabled" \| "disabled" }. See "Reasoning & thinking".
thinking_budget	integer	Optional	الحد الأقصى للرمز المميز لتتبع المنطق الخاص بالنموذج (عندما يكشف الموفر عن واحد).
ignore_defaults	boolean	Optional	تخطي المعلمات الافتراضية المحفوظة لكل نموذج (التي تم تكوينها في لوحة المعلومات) لهذا الطلب.
skill	string	Optional	ID of a single marketplace skill to apply to this request. The skill transforms your messages/parameters before the upstream call and overrides any installed-skill defaults. Consumed by Airforce, never forwarded upstream. See the Skills catalog at /docs/api/extend.
skills	array	Optional	Array of marketplace skill IDs applied in order, for stacking multiple skills on one request.

مثال أساسي

curl https://api.airforce/v1/chat/completions \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.1-chat",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is the capital of France?"}
    ],
    "max_tokens": 200,
    "temperature": 0.7
  }'

شكل الرد

Parameter	Type	Required	Description
id	string	Optional	معرّف إكمال ثابت، على سبيل المثال "chatcmpl-abc123".
object	string	Optional	"chat.completion" لغير المتدفقة، "chat.completion.chunk" للبث المباشر.
created	integer	Optional	الطابع الزمني لنظام التشغيل Unix (بالثواني).
model	string	Optional	صدى معرف النموذج المطلوب.
choices	array	Optional	مصفوفة من مرشّحي الإكمال: [{ index, message: { role, content, tool_calls? }, finish_reason }].
choices[].finish_reason	string	Optional	"توقف" \| "الطول" \| "tool_calls" \| "مرشح_المحتوى".
usage	object	Optional	{ prompt_tokens, completion_tokens, total_tokens, completion_tokens_details?, prompt_tokens_details?, cache_creation_input_tokens?, cache_creation? }. يتم تعيين completion_tokens_details.reasoning_tokens عندما ينتج النموذج أثر استدلال. تظهر حقول التخزين المؤقت عندما يعيد upstream معلومات التخزين المؤقت للموجه: prompt_tokens_details.cached_tokens يُبلغ عن قراءات التخزين المؤقت (معيار OpenAI)، cache_creation_input_tokens يجمع عمليات الكتابة، و cache_creation.ephemeral_5m_input_tokens / ephemeral_1h_input_tokens يعطيان التقسيم حسب TTL.

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1710000000,
  "model": "gpt-5.1-chat",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "The capital of France is Paris."
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 20,
    "completion_tokens": 8,
    "total_tokens": 28
  }
}

الاستدلال والتفكير

Reasoning/thinking is a cross-model feature for every model ID with supports_reasoning: true — Claude, OpenAI o-series/GPT-5, Gemini, Qwen, DeepSeek, and others. You send the same canonical parameters; Airforce maps them to each provider's native shape. This is not a DeepSeek-only API.

Truth source: check supports_reasoning: true على نموذج في GET /v1/models (or GET /api/models/{id}/allowed-params). Prefer that flag over guessing from the model name.

نماذج مع دعم المنطق

…· live

المعلمات الكنسي

Parameter	Type	Required	Description
reasoning_effort	string	Optional	"low" \| "medium" \| "high" \| "xhigh" \| "max". Accepted on every model with supports_reasoning: true. Some upstreams only honour a subset (e.g. high/max); others clamp unsupported levels to the nearest served value.
thinking	string \| object	Optional	Three accepted shapes (we normalise): "on" \| "off" \| "auto"; Anthropic-style { type: "enabled", budget_tokens: N }; hybrid { type: "enabled" \| "disabled" }. Mapped onto Claude extended thinking, OpenAI effort profiles, Gemini thinking_config, Qwen enable_thinking, DeepSeek hybrid, etc.
thinking_budget	integer	Optional	Maximum tokens the model may spend reasoning before emitting visible output. Mirrors budget_tokens when the upstream exposes a budget; takes precedence over reasoning_effort when both are sent and a budget is available.

What differs by family (mapping only)

Parameters are the same everywhere. Only how we map them (and how hard "off" is) differs:

Claude — Thinking on/off + budget; often also reasoning_effort via the gateway.
OpenAI (o1/o3, GPT-5) — Mainly reasoning_effort. A full "thinking off" is often not available — you control how strongly the model reasons, not always whether it reasons at all.
Gemini — thinking_config / budget mapped internally.
Qwen / Xiaomi / Alibaba — thinking + enable_thinking-style controls.
DeepSeek (generic) — Hybrid on/off is especially clear: thinking: { type: enabled|disabled } plus optional reasoning_effort.
Resellers / other — Often generic passthrough of the same canonical fields.

Controlling where the trace appears

An optional reasoning object on the request decides what happens to the thinking trace. It is consumed by Airforce and never forwarded upstream.

Parameter	Type	Required	Description
reasoning.format	string	Optional	"separate" (default) puts the trace in message.reasoning (and delta.reasoning while streaming). "inline" keeps the legacy inline <think>…</think> form inside content.
reasoning.exclude	boolean	Optional	When true, the reasoning trace is dropped entirely from the response. Reasoning tokens are still counted and billed if the model produced them.

"reasoning": { "format": "separate", "exclude": false }

جهد التفكير (أسلوب OpenAI)

Primary control for o-series and GPT-5: how much the model may reason. Same canonical field as on every other supports_reasoning model — OpenAI is included, but behaviour is not 1:1 with DeepSeek's hard on/off.

curl https://api.airforce/v1/chat/completions \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "o3-mini",
    "messages": [{"role": "user", "content": "Prove the Pythagorean theorem."}],
    "reasoning_effort": "high"
  }'

التفكير الموسع (نمط Anthropic)

Budget-based thinking for Claude (and gateways that accept the Anthropic shape). You can still send reasoning_effort; we map when the channel supports it.

curl https://api.airforce/v1/chat/completions \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4.6",
    "messages": [{"role": "user", "content": "Plan a 7-day Italy trip."}],
    "thinking": {"type": "enabled", "budget_tokens": 4000}
  }'

Hybrid thinking (e.g. DeepSeek V3.2/V4)

Example of a hybrid model family with a clear Thinking / Non-Thinking switch — not a separate protocol. deepseek-v3.2, deepseek-v4-flash and deepseek-v4-pro accept the same canonical fields as every other supports_reasoning model. Toggle thinking and optionally set effort in one request:

curl https://api.airforce/v1/chat/completions \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-v4-pro",
    "messages": [{"role": "user", "content": "Solve this step by step: integrate x^2 * e^x."}],
    "thinking": {"type": "enabled"},
    "reasoning_effort": "high"
  }'

Turn thinking off (faster, cheaper when you only need the final answer) — this hard off is clearer on hybrid models than on many OpenAI o-series profiles:

"thinking": {"type": "disabled"}
// or simply: "thinking": "off"

Native docs for this family often list effort levels such as "high" and "max". We accept the full low…max scale and map unsupported levels to the nearest value that reaches the model. Prefer the hybrid IDs above over retired deepseek-chat / deepseek-reasoner names when you need an explicit on/off switch.

يظهر أثر المنطق نفسه في choices[0].message.reasoning (شكل OpenAI) أو كما يلي thinking كتل في content (تنسيق Anthropic). يتم إصدار فاتورة بالرموز المميزة للاستدلال والإبلاغ عنها usage.completion_tokens_details.reasoning_tokens.

يكون تفصيل completion_tokens_details.reasoning_tokens موجوداً فقط عندما يُبلّغ عنه المزوّد الأساسي. في الاستجابة بنمط stream يصل الأثر على delta.reasoning_content لكل chunk.

الرؤية وإدخال الصورة

نماذج مع supports_vision: true قبول الصور المضمنة ككتل المحتوى. يعمل إما عنوان URL عام أو عنوان URL لبيانات base64؛ تعتمد حدود الحجم على النموذج الأولي.

نماذج مع دعم الرؤية

…· live

curl https://api.airforce/v1/chat/completions \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.1-chat",
    "messages": [{
      "role": "user",
      "content": [
        {"type": "text", "text": "What is in this image?"},
        {"type": "image_url", "image_url": {"url": "https://example.com/cat.jpg"}}
      ]
    }]
  }'

استدعاء الأداة

نماذج مع supports_tools: true يمكن استدعاء الوظائف التي تحددها. يعود النموذج أ tool_calls صفيف؛ تقوم بتشغيل المكالمة، ثم إرسال النتيجة مرة أخرى في ملف tool رسالة.

نماذج مع دعم استدعاء الأداة

…· live

طلب

curl https://api.airforce/v1/chat/completions \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.1-chat",
    "messages": [{"role": "user", "content": "What is the weather in Paris?"}],
    "tools": [{
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get current weather for a location",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {"type": "string", "description": "City name"}
          },
          "required": ["location"]
        }
      }
    }],
    "tool_choice": "auto"
  }'

الاستجابة باستدعاء الأداة

{
  "id": "chatcmpl-abc123",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": null,
      "tool_calls": [{
        "id": "call_1",
        "type": "function",
        "function": {
          "name": "get_weather",
          "arguments": "{\"location\":\"Paris\"}"
        }
      }]
    },
    "finish_reason": "tool_calls"
  }]
}

متابعة نتيجة الأداة

{
  "model": "gpt-5.1-chat",
  "messages": [
    {"role": "user", "content": "What is the weather in Paris?"},
    {
      "role": "assistant",
      "content": null,
      "tool_calls": [{
        "id": "call_1",
        "type": "function",
        "function": {"name": "get_weather", "arguments": "{\"location\":\"Paris\"}"}
      }]
    },
    {"role": "tool", "tool_call_id": "call_1", "content": "{\"temp_c\": 14, \"sky\": \"cloudy\"}"}
  ]
}

Assistant prefill

End your messages array with an assistant message that already contains some text, and the model continues from it instead of starting a fresh turn. This is a reliable way to force a response to begin a specific way — a leading "{" for JSON, a chosen language, or a fixed prefix. The same trick works on /v1/messages. Providers that reject native prefill are handled automatically: the gateway retries once with a compatible rewrite, so you do not have to special-case them.

{
  "model": "claude-sonnet-4.6",
  "messages": [
    {"role": "user", "content": "List three primary colors as a JSON array."},
    {"role": "assistant", "content": "["}
  ]
}

Structured outputs

Set response_format to make the model return JSON. Two modes are supported:

{ "type": "json_object" } — the response is a single valid JSON value.
{ "type": "json_schema", "json_schema": { "name", "schema", "strict" } } — the model is steered to produce JSON that matches your JSON Schema.

curl https://api.airforce/v1/chat/completions \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.1-chat",
    "messages": [{"role": "user", "content": "Extract the city and country: I live in Paris, France."}],
    "response_format": {
      "type": "json_schema",
      "json_schema": {
        "name": "location",
        "schema": {
          "type": "object",
          "properties": { "city": {"type": "string"}, "country": {"type": "string"} },
          "required": ["city", "country"]
        }
      }
    }
  }'

Reliability: even when a model wraps its answer in prose or a markdown code fence, Airforce extracts the JSON payload so you always receive parseable content. If no valid JSON can be recovered, the original text is returned unchanged — so the guarantee never makes a response worse. This applies to non-streamed responses; streamed responses are passed through unchanged.

جاري

تعيين stream: true لتلقي الإكمالات الجزئية كأحداث مرسلة من الخادم. كل حدث عبارة عن قطعة JSON واحدة بنفس شكل الاستجابة غير المتدفقة، باستثناء message يتم استبداله ب delta. ينتهي الدفق ب data: [DONE].

curl https://api.airforce/v1/chat/completions \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.1-chat",
    "messages": [{"role": "user", "content": "Write a haiku about Berlin."}],
    "stream": true,
    "stream_options": {"include_usage": true}
  }'

تنسيق الأسلاك

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1710000000,"model":"gpt-5.1-chat","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1710000000,"model":"gpt-5.1-chat","choices":[{"index":0,"delta":{"content":"Cold "},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1710000000,"model":"gpt-5.1-chat","choices":[{"index":0,"delta":{"content":"stone "},"finish_reason":null}]}

…

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1710000000,"model":"gpt-5.1-chat","choices":[{"index":0,"delta":{},"finish_reason":"stop"}],"usage":{"prompt_tokens":12,"completion_tokens":17,"total_tokens":29}}

data: [DONE]

Reliability & smart routing

Every model ID resolves to a pool of upstream providers behind the scenes. If the first one errors or times out, the request is automatically retried against the next provider for the same model, in order, before any failure is returned — you do not configure or trigger this. The model field in the response always reports the variant that actually answered. This is independent of the optional models / fallbacks array, which adds your own cross-model candidates on top: first the primary model exhausts its own provider chain, then each fallback model exhausts its chain.

POST /v1/messages

واجهة برمجة تطبيقات الرسائل المتوافقة مع Anthropic. يعمل مع المسؤول @anthropic-ai/sdk عن طريق الإعداد baseURL ل https://api.airforce. إعادة التوجيه إلى OpenAI/Google/إلخ. بشفافية بالنسبة للنماذج غير كلود.

POSThttps://api.airforce/v1/messages

هيئة الطلب

Parameter	Type	Required	Description
model	string	Required	معرف النموذج (تنسيق Anthropic أو اسم مستعار موجه).
messages	array	Required	كل إدخال: { role: "user" \| "assistant", content: string \| array }.
max_tokens	integer	Required	مطلوب من قبل Anthropic. غطاء رمزي للاستجابة.
system	string \| array	Optional	موجّه النظام. مرّر مصفوفة من كتل { type: "text", text, cache_control? } لتمييز مقاطع البادئة المخزَّنة مؤقتًا. راجع "Prompt caching".
temperature	float	Optional	0-1.
top_p	float	Optional	أخذ عينات النواة.
top_k	integer	Optional	قصر تجمع العينات على رموز Top-K.
stop_sequences	array	Optional	ما يصل إلى 4 تسلسلات توقف.
stream	boolean	Optional	عندما يكون صحيحًا، يُصدر دفق أحداث SSE على نمط Anthropic (راجع "البث").
fallbacks	array	Optional	Fallback models (max 3) in Anthropic form: [{"model": "gpt-4o-mini"}]. If every channel of the primary model fails, each candidate is tried in order; you are billed for — and the response model field reports — the model that actually answered. A plain models string array is accepted too.
tools	array	Optional	تعريفات أدوات Anthropic: { name, description, input_schema }. قد تحتوي الاستجابة على كتل محتوى tool_use.
tool_choice	object	Optional	{ type: "auto" \| "any" \| "tool", name? }.
thinking	object	Optional	التفكير الموسَّع من Anthropic: { type: "enabled", budget_tokens: N }.

مثال

curl https://api.airforce/v1/messages \
  -H "x-api-key: sk-air-YOUR_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4.6",
    "max_tokens": 256,
    "system": "You are a helpful assistant.",
    "messages": [
      {"role": "user", "content": "Hello, Claude!"}
    ]
  }'

شكل الرد

Parameter	Type	Required	Description
id	string	Optional	معرف الرسالة، على سبيل المثال. "msg_01ABCxyz".
type	string	Optional	دائما "رسالة".
role	string	Optional	دائما "مساعد".
content	array	Optional	مصفوفة من كتل المحتوى: { type: "text" \| "tool_use" \| "thinking", … }.
model	string	Optional	صدى النموذج المطلوب.
stop_reason	string	Optional	"end_turn" \| "max_tokens" \| "stop_sequence" \| "استخدام الأداة".
usage	object	Optional	{ input_tokens, output_tokens, cache_read_input_tokens?, cache_creation_input_tokens?, cache_creation? }. تظهر حقول التخزين المؤقت عند استخدام التخزين المؤقت للموجه. cache_creation.ephemeral_5m_input_tokens و ephemeral_1h_input_tokens يعطيان تقسيم الكتابة حسب TTL.

أحداث الجري

يستخدم Anthropic SSE الأحداث المسماة بدلاً من أجزاء JSON لمرة واحدة. كل حدث لديه كل من event: اسم و data: حمولة JSON.

event: message_start
data: {"type":"message_start","message":{"id":"msg_01","role":"assistant","content":[],"model":"claude-sonnet-4.6","stop_reason":null,"usage":{"input_tokens":12,"output_tokens":1}}}

event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"Hello"}}

event: content_block_stop
data: {"type":"content_block_stop","index":0}

event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"end_turn"},"usage":{"output_tokens":17}}

event: message_stop
data: {"type":"message_stop"}

POST /v1/messages/count_tokens

Anthropic-compatible token counting. Send the same system / messages / tools you would pass to /v1/messages and get an input-token estimate back without running the model — nothing is billed.

POSThttps://api.airforce/v1/messages/count_tokens

curl https://api.airforce/v1/messages/count_tokens \
  -H "x-api-key: sk-air-YOUR_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4.6",
    "system": "You are a helpful assistant.",
    "messages": [{"role": "user", "content": "Hello, Claude!"}]
  }'

# → {"input_tokens": 34}

The count is a fast character-based estimate (about 4 characters per token) over system, messages and tools — close enough for context-budget checks, not an exact tokenizer run.

التخزين المؤقت الفوري

على /v1/messages مع نماذج Claude، قم بوضع علامة على البادئة كمخزنة مؤقتًا عن طريق المرور system كصفيف من الكتل حيث يحمل الجزء المخزن مؤقتًا cache_control: { type: "ephemeral" }. الطلبات اللاحقة التي تبدأ بنفس البادئة تفرض سعرًا أرخص لقراءة ذاكرة التخزين المؤقت. نماذج مع supports_caching: true في /v1/models دعم هذا.

Write vs read pricing

Cache writes are typically charged slightly above normal input (about 1.25× on Claude-family models). Cache reads are much cheaper (about 0.1× input). A large write with almost no later read is the expensive case — not a “cache discount”. Only reusing the same prefix turns the write into savings.

Tools like Claude Code often attach a large project context with cache markers on the first turns. Expect cache-write spend while the repo/system prefix is loaded; later turns only get cheap if that prefix is stable and reused. Subagents and multi-step agents can multiply large contexts across several requests.

نماذج مع التخزين المؤقت الفوري

…· live

{
  "model": "claude-sonnet-4.6",
  "max_tokens": 1024,
  "system": [
    {"type": "text", "text": "You are a senior staff engineer at Airforce."},
    {
      "type": "text",
      "text": "<repository-snapshot>...</repository-snapshot>",
      "cache_control": {"type": "ephemeral"}
    }
  ],
  "messages": [
    {"role": "user", "content": "Where is rate limiting enforced?"}
  ]
}

كيف يتم الإبلاغ عن أعداد التخزين المؤقت في الاستجابة

يتم تمرير أعداد رموز التخزين المؤقت في الشكل الأصلي لكل تنسيق، لذا تقرأها SDKs (openai، @anthropic-ai/sdk، @google/genai) دون كود مخصص. يتم حذف الحقول عندما تكون القيمة صفرًا، مما يحافظ على استجابات غير المخزنة مؤقتًا خفيفة.

/v1/chat/completions (شكل OpenAI)

"usage": {
  "prompt_tokens": 2104,
  "completion_tokens": 147,
  "total_tokens": 2251,
  "prompt_tokens_details": { "cached_tokens": 1980 },
  "cache_creation_input_tokens": 124,
  "cache_creation": {
    "ephemeral_5m_input_tokens": 124,
    "ephemeral_1h_input_tokens": 0
  }
}

/v1/messages (شكل Anthropic)

"usage": {
  "input_tokens": 2104,
  "output_tokens": 147,
  "cache_read_input_tokens": 1980,
  "cache_creation_input_tokens": 124,
  "cache_creation": {
    "ephemeral_5m_input_tokens": 124,
    "ephemeral_1h_input_tokens": 0
  }
}

/v1beta/.../generateContent (شكل Gemini)

"usageMetadata": {
  "promptTokenCount": 2104,
  "candidatesTokenCount": 147,
  "totalTokenCount": 2251,
  "cachedContentTokenCount": 1980
}

أين ينطبق التخزين المؤقت

تُحترم علامات cache_control الصريحة على /v1/messages و /v1/chat/completions لنماذج Claude — ضعها على كتل محتوى system أو message. تقوم مزودات أخرى كثيرة (عائلة OpenAI و DeepSeek و Gemini) بالتخزين المؤقت تلقائيًا: لا ترسل أي علامات وترى ببساطة cached_tokens في الاستجابة بمجرد إعادة استخدام بادئة طويلة بما يكفي.

مدة التخزين المؤقت: 5 دقائق أو ساعة واحدة

تعيش البادئة المخزّنة 5 دقائق افتراضيًا ويتجدد المؤقّت مع كل إصابة. لبادئة أطول عمرًا، أضف ttl: "1h" إلى العلامة. تُبلّغ الاستجابة عن كل TTL على حدة ضمن cache_creation.

"cache_control": { "type": "ephemeral", "ttl": "1h" }

مثال عملي: أولًا كتابة، ثم قراءة

أرسل الطلب نفسه تمامًا مرتين (مثال التخزين أعلاه). الاستدعاء الأول الذي يرى البادئة يدفع كتابة تخزين لمرة واحدة؛ والاستدعاءات المطابقة ضمن مدة TTL تدفع قراءة تخزين أرخص بكثير.

الاستدعاء الأول — كتابة في التخزين (مقتطف usage):

"usage": {
  "input_tokens": 2104,
  "output_tokens": 12,
  "cache_creation_input_tokens": 1980,
  "cache_read_input_tokens": 0
}

الاستدعاء الثاني المطابق ضمن TTL — قراءة من التخزين:

"usage": {
  "input_tokens": 2104,
  "output_tokens": 12,
  "cache_creation_input_tokens": 0,
  "cache_read_input_tokens": 1980
}

الحدود والتكلفة

يتطلب Claude بادئة قابلة للتخزين بحد أدنى (نحو 1024 رمزًا؛ وأكثر لبعض النماذج). البوادئ الأقصر لا تُخزَّن ببساطة.
حتى 4 نقاط تخزين لكل طلب، ويجب أن تكون البادئة المخزّنة متطابقة بايت ببايت عبر الاستدعاءات — حتى تغيير حرف واحد يُفوّت التخزين.
تكلّف عمليات الكتابة في التخزين أكثر من الإدخال العادي (5m ≈ 1.25×، 1h ≈ 2×)؛ والقراءات أرخص بكثير (≈ 0.1×). راجع أسعار التخزين لكل نموذج في صفحة الأسعار.

POST /v1/responses

واجهة OpenAI Responses-API للمحادثات الحالة. نفس مصادقة Bearer/x-api-key. تظهر أعداد التخزين المؤقت كـ input_tokens_details.cached_tokens (قراءة) بالإضافة إلى cache_creation_input_tokens المسطح + cache_creation.ephemeral_* (كتابات) لتعادل مع /v1/chat/completions.

POSThttps://api.airforce/v1/responses

POST /v1beta/models/{model}:generateContent

Google Gemini-compatible endpoint. Works with the official @google/genai SDK and the Gemini CLI by pointing the base URL at https://api.airforce/v1beta. Any routed model works — requests are translated to and from the native Gemini shape, and the model is taken from the URL path (not the body).

POSThttps://api.airforce/v1beta/models/{model}:generateContent

Authentication

Pass your Airforce API key any of the three ways Google clients use:

# 1) query parameter (Google default)
?key=sk-air-YOUR_API_KEY

# 2) header
x-goog-api-key: sk-air-YOUR_API_KEY

# 3) bearer token
Authorization: Bearer sk-air-YOUR_API_KEY

Request body

Parameter	Type	Required	Description
contents	array	Required	Conversation turns. Each: { role: "user" \| "model", parts: [...] }. A part is { text }, { functionCall: { name, args } }, or { functionResponse: { name, response } }. "model" is Gemini's term for the assistant role.
systemInstruction	object	Optional	System prompt: { parts: [{ text }] }.
generationConfig	object	Optional	{ temperature, maxOutputTokens, topP, stopSequences } — mapped to the canonical sampling parameters.
tools	array	Optional	Tool definitions: [{ functionDeclarations: [{ name, description, parameters }] }]. functionDeclarations are flattened across entries.
toolConfig	object	Optional	Tool-choice control: { functionCallingConfig: { mode: "AUTO" \| "ANY" \| "NONE" } }. ANY forces a call, NONE disables tools.

Example

curl "https://api.airforce/v1beta/models/gemini-3.1-pro:generateContent" \
  -H "x-goog-api-key: sk-air-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [
      {"role": "user", "parts": [{"text": "What is the capital of France?"}]}
    ],
    "systemInstruction": {"parts": [{"text": "You are a helpful assistant."}]},
    "generationConfig": {"temperature": 0.7, "maxOutputTokens": 256}
  }'

Response shape

Parameter	Type	Required	Description
candidates	array	Optional	Generated turns: [{ content: { role: "model", parts }, finishReason, index }]. Only the first candidate is populated.
candidates[].finishReason	string	Optional	"STOP" \| "MAX_TOKENS" \| "SAFETY" \| "OTHER".
usageMetadata	object	Optional	{ promptTokenCount, candidatesTokenCount, totalTokenCount, cachedContentTokenCount? }. cachedContentTokenCount appears when the upstream reported a cache read.
modelVersion	string	Optional	Echo of the requested model.

{
  "candidates": [{
    "content": {
      "role": "model",
      "parts": [{"text": "The capital of France is Paris."}]
    },
    "finishReason": "STOP",
    "index": 0
  }],
  "usageMetadata": {
    "promptTokenCount": 16,
    "candidatesTokenCount": 8,
    "totalTokenCount": 24
  },
  "modelVersion": "gemini-3.1-pro"
}

POST /v1beta/models/{model}:streamGenerateContent

Streaming uses the :streamGenerateContent action and returns Server-Sent Events. Each data: line is a full Gemini-shaped chunk (not a delta object); the final chunk carries usageMetadata.

data: {"candidates":[{"content":{"role":"model","parts":[{"text":"The capital"}]},"index":0}],"modelVersion":"gemini-3.1-pro"}

data: {"candidates":[{"content":{"role":"model","parts":[{"text":" is Paris."}]},"index":0}],"modelVersion":"gemini-3.1-pro"}

data: {"candidates":[{"content":{"role":"model","parts":[]},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":16,"candidatesTokenCount":8,"totalTokenCount":24}}

List models

The catalog is also exposed in Gemini Model-resource shape so Google clients can enumerate models.

curl https://api.airforce/v1beta/models

Notes: the base URL is https://api.airforce/v1beta (or /v1), not Google's host. The model name comes from the URL path, not the request body. Only the first candidate is returned, and a subset of Gemini fields is translated — safetySettings and cachedContent are currently ignored. Billing, rate limits and smart routing apply exactly as on /v1/chat/completions.

أخطاء

تقوم Airforce بإرجاع رموز حالة HTTP القياسية ومغلف خطأ موحد لكلا نقطتي النهاية.

Parameter	Type	Required	Description
400	invalid_request_error	Optional	JSON غير صحيح، الحقل المطلوب مفقود، النموذج غير معروف.
401	invalid_request_error / auth_required	Optional	مفتاح API مفقود أو غير صالح.
402	insufficient_quota	Optional	يتطلب النموذج اشتراكًا نشطًا أو رصيد Pay-as-you-Go موجبًا.
403	model_access_denied / insufficient_scope	Optional	ترفض أذونات الخطة أو لكل مفتاح هذا الطلب.
404	model_not_found	Optional	النموذج المطلوب غير موجود أو ليس لديك صلاحية الوصول إليه.
429	rate_limit_error	Optional	تم تجاوز معدل الطلب أو الحد الأقصى اليومي للرمز المميز.
503	api_error / moderation_unavailable	Optional	فشلت كافة المفاتيح الأولية للموفر المطلوب.

{
  "error": {
    "message": "The requested model does not exist or you do not have access to it.",
    "type": "model_not_found",
    "param": null,
    "code": "404"
  }
}

يوجد الـ slug الوصفي في type. أما code فهو حالة HTTP كسلسلة نصية (مثل "404")، وparam يكون null باستثناء أخطاء التحقّق من نطاق المعاملات، حيث يسمّي المعامل المخالف.

اكتشف النماذج

راجع القائمة الكاملة لمعرفات النماذج وعلامات قدراتها (الرؤية، الأدوات، الاستدلال، التخزين المؤقت، طول السياق، ...) على /docs/api/models.

curl https://api.airforce/v1/models \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY"