Chat completions
Hasilkan respons obrolan di 100+ model dari satu API. Drop-in kompatibel dengan OpenAI Chat Completions, Anthropic Messages, dan Anthropic Responses.
Airforce mendukung kedua format wire OpenAI Chat Completions dan Anthropic Messages di atas kumpulan model yang sama. Pilih SDK yang sudah Anda gunakan dan cukup ubah base URL — model non-Claude diteruskan secara transparan di balik kedua surface.
Halaman ini membahas autentikasi, bentuk request dan response untuk kedua surface, streaming, tool calling, vision, reasoning, dan prompt caching. Baru di sini? Mulai dengan contoh dasar di bawah, jalankan satu call hingga berhasil, lalu tambahkan streaming, tools, atau caching setelahnya.
Otentikasi
Setiap permintaan memerlukan token Bearer (kunci API Airforce Anda). Header Anthropic x-api-key header juga diterima /v1/messages untuk kompatibilitas SDK.
Authorization: Bearer sk-air-YOUR_API_KEY
# alt for /v1/messages:
x-api-key: sk-air-YOUR_API_KEYPOST /v1/chat/completions
Chat Completions yang kompatibel dengan OpenAI. Bekerja dengan SDK resmi openai SDK dengan mengganti base_url ke https://api.airforce/v1.
https://api.airforce/v1/chat/completionsRequest body
| Parameter | Type | Required | Description |
|---|---|---|---|
| model | string | Required | ID Model. Gunakan GET /v1/models untuk menemukan ID yang tersedia. |
| messages | array | Required | Riwayat percakapan. Setiap entri memiliki { role: "system" | "user" | "assistant" | "tool", content }. Content berupa string atau array blok konten (vision, lihat di bawah). |
| max_tokens | integer | Optional | Jumlah maksimum token yang akan dihasilkan. Dibatasi pada max_output_tokens model. |
| temperature | float | Optional | Suhu pengambilan sampel, 0–2. Lebih rendah lebih deterministik. Defaultnya bergantung pada penyedia upstream. |
| top_p | float | Optional | Pengambilan sampel inti. Gunakan suhu atau top_p, jangan keduanya. |
| stream | boolean | Optional | Jika benar, responsnya adalah aliran Peristiwa yang Dikirim Server. Lihat "Streaming" di bawah. |
| models | array | Optional | Fallback models (max 3), e.g. ["deepseek-v3.2", "gpt-4o-mini"]. If every channel of the primary model fails, each candidate is tried in order. You are billed for — and response.model reports — the model that actually answered. Unknown or plan-gated candidates are skipped. With the OpenAI SDK pass it via extra_body. |
| transforms | array | Optional | Prompt transforms. Supported: ["middle-out"] — when the conversation overflows the model's context window, whole messages are dropped from the middle (system prompts, the first message and the most recent turns are kept), so long roleplay or agent histories keep working instead of erroring. Opt-in; off by default. |
| stream_options | object | Optional | { include_usage: boolean }. Usage selalu disertakan pada chunk streaming terakhir; field ini diterima demi kompatibilitas OpenAI tetapi tidak dapat menonaktifkannya. |
| stop | string | array | Optional | Hingga 4 urutan perhentian. Generasi berhenti segera setelah diproduksi. |
| tools | array | Optional | Definisi fungsi yang mungkin dipanggil oleh model. Lihat "Pemanggilan alat" di bawah. |
| tool_choice | string | object | Optional | "auto" (default), "none", atau { type: "function", function: { name } } untuk memaksa panggilan tertentu. |
| response_format | object | Optional | { type: "json_object" } memaksa model untuk memancarkan JSON yang valid. Diabaikan untuk model yang tidak mendukungnya. |
| reasoning_effort | string | Optional | Kedalaman penalaran gaya OpenAI o1/o3: "low" | "medium" | "high". Lihat "Reasoning & thinking". |
| thinking | string | object | Optional | Saklar thinking lintas penyedia. "on" | "off" | "auto", atau bentuk Anthropic { type: "enabled", budget_tokens: N }. Lihat "Reasoning & thinking". |
| thinking_budget | integer | Optional | Batas token untuk jejak penalaran model (ketika penyedia memaparkannya). |
| ignore_defaults | boolean | Optional | Lewati parameter default per model yang disimpan pengguna (dikonfigurasi di dasbor) untuk permintaan ini. |
Contoh dasar
curl https://api.airforce/v1/chat/completions \
-H "Authorization: Bearer sk-air-YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-5.1-chat",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of France?"}
],
"max_tokens": 200,
"temperature": 0.7
}'Bentuk respons
| Parameter | Type | Required | Description |
|---|---|---|---|
| id | string | Optional | ID penyelesaian yang stabil, mis. "chatcmpl-abc123". |
| object | string | Optional | "chat.completion" untuk non-streaming, "chat.completion.chunk" untuk streaming. |
| created | integer | Optional | Stempel waktu Unix (detik). |
| model | string | Optional | Gema ID model yang diminta. |
| choices | array | Optional | Array kandidat penyelesaian: [{ index, message: { role, content, tool_calls? }, finish_reason }]. |
| choices[].finish_reason | string | Optional | "stop" | "length" | "tool_calls" | "content_filter". |
| usage | object | Optional | { prompt_tokens, completion_tokens, total_tokens, completion_tokens_details?, prompt_tokens_details?, cache_creation_input_tokens?, cache_creation? }. completion_tokens_details.reasoning_tokens diatur ketika model menghasilkan jejak penalaran. Bidang cache muncul ketika upstream mengembalikan info prompt-caching: prompt_tokens_details.cached_tokens melaporkan pembacaan cache (standar OpenAI), cache_creation_input_tokens menggabungkan penulisan, dan cache_creation.ephemeral_5m_input_tokens / ephemeral_1h_input_tokens memberikan pembagian per TTL. |
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1710000000,
"model": "gpt-5.1-chat",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": "The capital of France is Paris."
},
"finish_reason": "stop"
}],
"usage": {
"prompt_tokens": 20,
"completion_tokens": 8,
"total_tokens": 28
}
}Penalaran & pemikiran
Model yang mendukung penalaran yang diperluas memaparkan jejak thinking di samping keluaran reguler. Airforce menormalisasi tiga konvensi upstream yang berbeda menjadi satu set parameter kanonik yang berfungsi di mana saja.
Memeriksa supports_reasoning: true pada model di GET /v1/models untuk mengetahui ID mana yang menerima parameter ini.
Model dengan dukungan penalaran
…· liveParameter kanonik
| Parameter | Type | Required | Description |
|---|---|---|---|
| reasoning_effort | string | Optional | "rendah" | "sedang" | "tinggi". OpenAI o1/o3, model penalaran GPT-5, dan router apa pun yang memetakannya. |
| thinking | string | object | Optional | "on" | "off" | "auto" untuk peralihan cepat, atau { type: "enabled", budget_tokens: N } untuk bentuk Anthropic-native. Memetakan ke extended thinking Claude, thinking Gemini, dan reasoning DeepSeek. |
| thinking_budget | integer | Optional | Token maksimum yang mungkin digunakan model untuk dipikirkan sebelum mengeluarkan keluaran yang terlihat. Mencerminkan budget_tokens. |
Upaya penalaran (gaya OpenAI)
curl https://api.airforce/v1/chat/completions \
-H "Authorization: Bearer sk-air-YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "o3-mini",
"messages": [{"role": "user", "content": "Prove the Pythagorean theorem."}],
"reasoning_effort": "high"
}'Extended thinking (gaya Anthropic)
curl https://api.airforce/v1/chat/completions \
-H "Authorization: Bearer sk-air-YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "claude-sonnet-4.6",
"messages": [{"role": "user", "content": "Plan a 7-day Italy trip."}],
"thinking": {"type": "enabled", "budget_tokens": 4000}
}'Jejak penalaran itu sendiri muncul di choices[0].message.reasoning (Bentuk OpenAI) atau sebagai thinking blok masuk content (Bentuk Anthropic). Token penalaran ditagih dan dilaporkan usage.completion_tokens_details.reasoning_tokens.
Rincian completion_tokens_details.reasoning_tokens itu hanya muncul ketika provider upstream melaporkannya. Pada respons yang di-stream, trace tiba di delta.reasoning_content per chunk.
Masukan visi & gambar
Model dengan supports_vision: true menerima gambar yang disematkan sebagai blok konten. URL publik atau URL data base64 berfungsi; batas ukuran bergantung pada model hulu.
Model dengan dukungan penglihatan
…· livecurl https://api.airforce/v1/chat/completions \
-H "Authorization: Bearer sk-air-YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-5.1-chat",
"messages": [{
"role": "user",
"content": [
{"type": "text", "text": "What is in this image?"},
{"type": "image_url", "image_url": {"url": "https://example.com/cat.jpg"}}
]
}]
}'Panggilan alat
Model dengan supports_tools: true dapat memanggil fungsi yang Anda tetapkan. Model mengembalikan a tool_calls susunan; Anda menjalankan panggilan, lalu mengirimkan hasilnya kembali dalam a tool pesan.
Model dengan dukungan panggilan alat
…· liveMeminta
curl https://api.airforce/v1/chat/completions \
-H "Authorization: Bearer sk-air-YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-5.1-chat",
"messages": [{"role": "user", "content": "What is the weather in Paris?"}],
"tools": [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string", "description": "City name"}
},
"required": ["location"]
}
}
}],
"tool_choice": "auto"
}'Respons dengan panggilan alat
{
"id": "chatcmpl-abc123",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": null,
"tool_calls": [{
"id": "call_1",
"type": "function",
"function": {
"name": "get_weather",
"arguments": "{\"location\":\"Paris\"}"
}
}]
},
"finish_reason": "tool_calls"
}]
}Tindak lanjuti dengan hasil alat
{
"model": "gpt-5.1-chat",
"messages": [
{"role": "user", "content": "What is the weather in Paris?"},
{
"role": "assistant",
"content": null,
"tool_calls": [{
"id": "call_1",
"type": "function",
"function": {"name": "get_weather", "arguments": "{\"location\":\"Paris\"}"}
}]
},
{"role": "tool", "tool_call_id": "call_1", "content": "{\"temp_c\": 14, \"sky\": \"cloudy\"}"}
]
}Structured outputs
Set response_format to make the model return JSON. Two modes are supported:
{ "type": "json_object" }— the response is a single valid JSON value.{ "type": "json_schema", "json_schema": { "name", "schema", "strict" } }— the model is steered to produce JSON that matches your JSON Schema.
curl https://api.airforce/v1/chat/completions \
-H "Authorization: Bearer sk-air-YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-5.1-chat",
"messages": [{"role": "user", "content": "Extract the city and country: I live in Paris, France."}],
"response_format": {
"type": "json_schema",
"json_schema": {
"name": "location",
"schema": {
"type": "object",
"properties": { "city": {"type": "string"}, "country": {"type": "string"} },
"required": ["city", "country"]
}
}
}
}'Reliability: even when a model wraps its answer in prose or a markdown code fence, Airforce extracts the JSON payload so you always receive parseable content. If no valid JSON can be recovered, the original text is returned unchanged — so the guarantee never makes a response worse. This applies to non-streamed responses; streamed responses are passed through unchanged.
Streaming
Mengatur stream: true untuk menerima penyelesaian sebagian sebagai Acara yang Dikirim Server. Setiap peristiwa adalah satu potongan JSON dengan bentuk yang sama dengan respons non-streaming, kecuali message digantikan oleh delta. Aliran berakhir dengan data: [DONE].
curl https://api.airforce/v1/chat/completions \
-H "Authorization: Bearer sk-air-YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-5.1-chat",
"messages": [{"role": "user", "content": "Write a haiku about Berlin."}],
"stream": true,
"stream_options": {"include_usage": true}
}'Format kawat
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1710000000,"model":"gpt-5.1-chat","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1710000000,"model":"gpt-5.1-chat","choices":[{"index":0,"delta":{"content":"Cold "},"finish_reason":null}]}
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1710000000,"model":"gpt-5.1-chat","choices":[{"index":0,"delta":{"content":"stone "},"finish_reason":null}]}
…
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1710000000,"model":"gpt-5.1-chat","choices":[{"index":0,"delta":{},"finish_reason":"stop"}],"usage":{"prompt_tokens":12,"completion_tokens":17,"total_tokens":29}}
data: [DONE]POST /v1/messages
API Messages yang kompatibel dengan Anthropic. Bekerja dengan SDK resmi @anthropic-ai/sdk dengan mengatur baseURL ke https://api.airforce. Meneruskan ke OpenAI/Google/dll. secara transparan untuk model non-Claude.
https://api.airforce/v1/messagesRequest body
| Parameter | Type | Required | Description |
|---|---|---|---|
| model | string | Required | ID Model (format Anthropic atau alias yang dirutekan). |
| messages | array | Required | Setiap entri: { role: "user" | "assistant", content: string | array }. |
| max_tokens | integer | Required | Dibutuhkan oleh Anthropic. Batas token untuk respons. |
| system | string | array | Optional | System prompt. Berikan array blok { type: "text", text, cache_control? } untuk menandai segmen awalan yang di-cache. Lihat "Prompt caching". |
| temperature | float | Optional | 0–1. |
| top_p | float | Optional | Pengambilan sampel inti. |
| top_k | integer | Optional | Batasi kumpulan pengambilan sampel hanya pada token K teratas. |
| stop_sequences | array | Optional | Hingga 4 urutan perhentian. |
| stream | boolean | Optional | Jika true, memancarkan stream event SSE bergaya Anthropic (lihat "Streaming"). |
| fallbacks | array | Optional | Fallback models (max 3) in Anthropic form: [{"model": "gpt-4o-mini"}]. If every channel of the primary model fails, each candidate is tried in order; you are billed for — and the response model field reports — the model that actually answered. A plain models string array is accepted too. |
| tools | array | Optional | Definisi tool Anthropic: { name, description, input_schema }. Responsnya mungkin berisi blok konten tool_use. |
| tool_choice | object | Optional | { type: "auto" | "any" | "tool", name? }. |
| thinking | object | Optional | Extended thinking Anthropic: { type: "enabled", budget_tokens: N }. |
Contoh
curl https://api.airforce/v1/messages \
-H "x-api-key: sk-air-YOUR_API_KEY" \
-H "anthropic-version: 2023-06-01" \
-H "Content-Type: application/json" \
-d '{
"model": "claude-sonnet-4.6",
"max_tokens": 256,
"system": "You are a helpful assistant.",
"messages": [
{"role": "user", "content": "Hello, Claude!"}
]
}'Bentuk respons
| Parameter | Type | Required | Description |
|---|---|---|---|
| id | string | Optional | ID Pesan, mis. "msg_01ABCxyz". |
| type | string | Optional | Selalu "message". |
| role | string | Optional | Selalu "assistant". |
| content | array | Optional | Array blok konten: { type: "text" | "tool_use" | "thinking", … }. |
| model | string | Optional | Gema model yang diminta. |
| stop_reason | string | Optional | "end_turn" | "max_tokens" | "stop_sequence" | "tool_use". |
| usage | object | Optional | { input_tokens, output_tokens, cache_read_input_tokens?, cache_creation_input_tokens?, cache_creation? }. Bidang cache muncul ketika prompt caching digunakan. cache_creation.ephemeral_5m_input_tokens dan ephemeral_1h_input_tokens memberikan pembagian penulisan per TTL. |
Event streaming
SSE Anthropic menggunakan event bernama, bukan potongan JSON satu kali. Setiap event memiliki baik event: nama maupun sebuah data: muatan JSON.
event: message_start
data: {"type":"message_start","message":{"id":"msg_01","role":"assistant","content":[],"model":"claude-sonnet-4.6","stop_reason":null,"usage":{"input_tokens":12,"output_tokens":1}}}
event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}
event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"Hello"}}
event: content_block_stop
data: {"type":"content_block_stop","index":0}
event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"end_turn"},"usage":{"output_tokens":17}}
event: message_stop
data: {"type":"message_stop"}POST /v1/messages/count_tokens
Anthropic-compatible token counting. Send the same system / messages / tools you would pass to /v1/messages and get an input-token estimate back without running the model — nothing is billed.
https://api.airforce/v1/messages/count_tokenscurl https://api.airforce/v1/messages/count_tokens \
-H "x-api-key: sk-air-YOUR_API_KEY" \
-H "anthropic-version: 2023-06-01" \
-H "Content-Type: application/json" \
-d '{
"model": "claude-sonnet-4.6",
"system": "You are a helpful assistant.",
"messages": [{"role": "user", "content": "Hello, Claude!"}]
}'
# → {"input_tokens": 34}The count is a fast character-based estimate (about 4 characters per token) over system, messages and tools — close enough for context-budget checks, not an exact tokenizer run.
Prompt caching
Pada /v1/messages dengan model Claude, tandai awalan sebagai cache dengan meneruskan system sebagai larik blok tempat segmen yang di-cache dibawa cache_control: { type: "ephemeral" }. Permintaan berikutnya yang dimulai dengan awalan yang sama membebankan tarif baca cache yang lebih murah. Model dengan supports_caching: true di dalam /v1/models mendukung ini.
Model dengan prompt caching
…· live{
"model": "claude-sonnet-4.6",
"max_tokens": 1024,
"system": [
{"type": "text", "text": "You are a senior staff engineer at Airforce."},
{
"type": "text",
"text": "<repository-snapshot>...</repository-snapshot>",
"cache_control": {"type": "ephemeral"}
}
],
"messages": [
{"role": "user", "content": "Where is rate limiting enforced?"}
]
}Bagaimana hitungan cache dilaporkan dalam respons
Hitungan token cache diteruskan dalam bentuk asli setiap format, sehingga SDK (openai, @anthropic-ai/sdk, @google/genai) membacanya tanpa kode kustom. Bidang dihilangkan ketika nilainya nol, menjaga respons non-cached tetap ramping.
/v1/chat/completions (bentuk OpenAI)
"usage": {
"prompt_tokens": 2104,
"completion_tokens": 147,
"total_tokens": 2251,
"prompt_tokens_details": { "cached_tokens": 1980 },
"cache_creation_input_tokens": 124,
"cache_creation": {
"ephemeral_5m_input_tokens": 124,
"ephemeral_1h_input_tokens": 0
}
}/v1/messages (bentuk Anthropic)
"usage": {
"input_tokens": 2104,
"output_tokens": 147,
"cache_read_input_tokens": 1980,
"cache_creation_input_tokens": 124,
"cache_creation": {
"ephemeral_5m_input_tokens": 124,
"ephemeral_1h_input_tokens": 0
}
}/v1beta/.../generateContent (bentuk Gemini)
"usageMetadata": {
"promptTokenCount": 2104,
"candidatesTokenCount": 147,
"totalTokenCount": 2251,
"cachedContentTokenCount": 1980
}Di mana caching berlaku
Marker cache_control eksplisit dihormati di /v1/messages dan /v1/chat/completions untuk model Claude — pasang pada blok konten system atau message. Banyak penyedia lain (keluarga OpenAI, DeepSeek, Gemini) melakukan caching otomatis: Anda tidak mengirim marker dan cukup melihat cached_tokens di respons begitu prefix yang cukup panjang digunakan kembali.
Durasi cache: 5 menit atau 1 jam
Prefix yang di-cache bertahan 5 menit secara default dan timer disegarkan setiap kali kena. Untuk prefix yang bertahan lebih lama, tambahkan ttl: "1h" ke marker. Respons melaporkan setiap TTL secara terpisah di bawah cache_creation.
"cache_control": { "type": "ephemeral", "ttl": "1h" }Contoh: tulis dulu, lalu baca
Kirim permintaan yang persis sama dua kali (contoh caching di atas). Panggilan pertama yang melihat prefix membayar satu kali cache write; panggilan identik dalam TTL membayar cache read yang jauh lebih murah.
Panggilan pertama — cache write (cuplikan usage):
"usage": {
"input_tokens": 2104,
"output_tokens": 12,
"cache_creation_input_tokens": 1980,
"cache_read_input_tokens": 0
}Panggilan identik kedua dalam TTL — cache read:
"usage": {
"input_tokens": 2104,
"output_tokens": 12,
"cache_creation_input_tokens": 0,
"cache_read_input_tokens": 1980
}Batas & biaya
- Claude memerlukan prefix minimum yang dapat di-cache (sekitar 1024 token; lebih besar untuk beberapa model). Prefix yang lebih pendek tidak di-cache.
- Hingga 4 cache breakpoint per permintaan, dan prefix yang di-cache harus identik byte-per-byte antar panggilan — bahkan perubahan satu karakter pun meleset dari cache.
- Cache write lebih mahal daripada input biasa (5m ≈ 1,25×, 1h ≈ 2×); read jauh lebih murah (≈ 0,1×). Lihat harga cache tiap model di halaman harga.
POST /v1/responses
Permukaan OpenAI Responses-API untuk percakapan stateful. Autentikasi Bearer/x-api-key yang sama. Hitungan cache muncul sebagai input_tokens_details.cached_tokens (baca) ditambah cache_creation_input_tokens datar + cache_creation.ephemeral_* (tulis) untuk paritas dengan /v1/chat/completions.
https://api.airforce/v1/responsesPOST /v1beta/models/{model}:generateContent
Google Gemini-compatible endpoint. Works with the official @google/genai SDK and the Gemini CLI by pointing the base URL at https://api.airforce/v1beta. Any routed model works — requests are translated to and from the native Gemini shape, and the model is taken from the URL path (not the body).
https://api.airforce/v1beta/models/{model}:generateContentAuthentication
Pass your Airforce API key any of the three ways Google clients use:
# 1) query parameter (Google default)
?key=sk-air-YOUR_API_KEY
# 2) header
x-goog-api-key: sk-air-YOUR_API_KEY
# 3) bearer token
Authorization: Bearer sk-air-YOUR_API_KEYRequest body
| Parameter | Type | Required | Description |
|---|---|---|---|
| contents | array | Required | Conversation turns. Each: { role: "user" | "model", parts: [...] }. A part is { text }, { functionCall: { name, args } }, or { functionResponse: { name, response } }. "model" is Gemini's term for the assistant role. |
| systemInstruction | object | Optional | System prompt: { parts: [{ text }] }. |
| generationConfig | object | Optional | { temperature, maxOutputTokens, topP, stopSequences } — mapped to the canonical sampling parameters. |
| tools | array | Optional | Tool definitions: [{ functionDeclarations: [{ name, description, parameters }] }]. functionDeclarations are flattened across entries. |
| toolConfig | object | Optional | Tool-choice control: { functionCallingConfig: { mode: "AUTO" | "ANY" | "NONE" } }. ANY forces a call, NONE disables tools. |
Example
curl "https://api.airforce/v1beta/models/gemini-3.1-pro:generateContent" \
-H "x-goog-api-key: sk-air-YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"contents": [
{"role": "user", "parts": [{"text": "What is the capital of France?"}]}
],
"systemInstruction": {"parts": [{"text": "You are a helpful assistant."}]},
"generationConfig": {"temperature": 0.7, "maxOutputTokens": 256}
}'Response shape
| Parameter | Type | Required | Description |
|---|---|---|---|
| candidates | array | Optional | Generated turns: [{ content: { role: "model", parts }, finishReason, index }]. Only the first candidate is populated. |
| candidates[].finishReason | string | Optional | "STOP" | "MAX_TOKENS" | "SAFETY" | "OTHER". |
| usageMetadata | object | Optional | { promptTokenCount, candidatesTokenCount, totalTokenCount, cachedContentTokenCount? }. cachedContentTokenCount appears when the upstream reported a cache read. |
| modelVersion | string | Optional | Echo of the requested model. |
{
"candidates": [{
"content": {
"role": "model",
"parts": [{"text": "The capital of France is Paris."}]
},
"finishReason": "STOP",
"index": 0
}],
"usageMetadata": {
"promptTokenCount": 16,
"candidatesTokenCount": 8,
"totalTokenCount": 24
},
"modelVersion": "gemini-3.1-pro"
}POST /v1beta/models/{model}:streamGenerateContent
Streaming uses the :streamGenerateContent action and returns Server-Sent Events. Each data: line is a full Gemini-shaped chunk (not a delta object); the final chunk carries usageMetadata.
data: {"candidates":[{"content":{"role":"model","parts":[{"text":"The capital"}]},"index":0}],"modelVersion":"gemini-3.1-pro"}
data: {"candidates":[{"content":{"role":"model","parts":[{"text":" is Paris."}]},"index":0}],"modelVersion":"gemini-3.1-pro"}
data: {"candidates":[{"content":{"role":"model","parts":[]},"finishReason":"STOP","index":0}],"usageMetadata":{"promptTokenCount":16,"candidatesTokenCount":8,"totalTokenCount":24}}List models
The catalog is also exposed in Gemini Model-resource shape so Google clients can enumerate models.
curl https://api.airforce/v1beta/modelsNotes: the base URL is https://api.airforce/v1beta (or /v1), not Google's host. The model name comes from the URL path, not the request body. Only the first candidate is returned, and a subset of Gemini fields is translated — safetySettings and cachedContent are currently ignored. Billing, rate limits and smart routing apply exactly as on /v1/chat/completions.
Kesalahan
Airforce mengembalikan kode status HTTP standar dan amplop kesalahan seragam untuk kedua titik akhir.
| Parameter | Type | Required | Description |
|---|---|---|---|
| 400 | invalid_request_error | Optional | Format JSON salah, kolom wajib diisi tidak ada, model tidak dikenal. |
| 401 | invalid_request_error / auth_required | Optional | Kunci API tidak ada atau tidak valid. |
| 402 | insufficient_quota | Optional | Model memerlukan langganan aktif atau saldo Pay-as-you-Go yang positif. |
| 403 | model_access_denied / insufficient_scope | Optional | Izin paket atau per kunci menolak permintaan ini. |
| 404 | model_not_found | Optional | Model yang diminta tidak ada atau Anda tidak memiliki akses ke sana. |
| 429 | rate_limit_error | Optional | Tingkat permintaan atau batas token harian terlampaui. |
| 503 | api_error / moderation_unavailable | Optional | Semua kunci upstream untuk penyedia yang diminta gagal. |
{
"error": {
"message": "The requested model does not exist or you do not have access to it.",
"type": "model_not_found",
"param": null,
"code": "404"
}
}Slug deskriptif berada di type. code adalah status HTTP dalam bentuk string (mis. "404"), dan param bernilai null kecuali pada error validasi rentang parameter, di mana ia menyebutkan parameter yang bermasalah.
Temukan model
Lihat daftar lengkap ID model dan tanda kemampuannya (visi, alat, penalaran, cache, panjang konteks,…) di /docs/api/models.
curl https://api.airforce/v1/models \
-H "Authorization: Bearer sk-air-YOUR_API_KEY"