API REFERENCE

聲音的

文字轉語音、語音轉文字、音樂、音效、變聲、配音和語音複製——每個提供者一個 API 金鑰。

單一音訊介面涵蓋 text-to-speech、transcription、音樂、音效、dubbing、變聲與聲音複製。核心 endpoints 與 OpenAI 相容，而更豐富的進階選項 — voice settings、speaker diarization、dubbing — 只要上游供應商支援即可接受。

請先列出可用的 voices；你建立的複製聲音會出現在同一份清單中，使用方式也相同。

Endpoints in this section: /v1/audio/speech, /music, /sound-effects, /transcriptions, /audio-isolation, /voice-changer, /dubbing, /voices, plus /v1/voices/* for cloning.

文字轉語音

從文字合成語音。傳回具有匹配內容類型的原始音訊位元組（例如音訊/mpeg）。 PCM 和 µ-law 格式包含 WAV 標頭，因此它們可以在任何瀏覽器中播放。

POSThttps://api.airforce/v1/audio/speech

TTS 模型

…· live

Parameter	Type	Required	Description
model	string	Required	TTS model ID. See /v1/models for IDs with input_modalities containing "text" and output_modalities containing "audio".
input	string	Required	Text to synthesise. Long inputs are chunked automatically.
voice	string	Required	Voice ID. Use GET /v1/audio/voices to list options. Cloned voices appear here too.
response_format	string	Optional	"mp3" (default), "mp3_44100_128", "mp3_44100_192", "pcm_22050", "pcm_24000", "pcm_44100", "ulaw_8000".
speed	float	Optional	0.25 – 4.0. OpenAI-compatible. Some upstream providers ignore this.
voice_settings	object	Optional	ElevenLabs-shape: { stability: 0–1, similarity_boost: 0–1, style: 0–1, use_speaker_boost: bool }.
language_code	string	Optional	ISO-639-1 hint, e.g. "de", "en", "ja". Improves prosody for multilingual models.
seed	integer	Optional	Reproducibility seed where supported.

例子

curl https://api.airforce/v1/audio/speech \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  --output speech.mp3 \
  -d '{
    "model": "elevenlabs-multilingual-v2",
    "input": "Willkommen bei Airforce.",
    "voice": "21m00Tcm4TlvDq8ikWAM",
    "response_format": "mp3_44100_128",
    "voice_settings": {"stability": 0.6, "similarity_boost": 0.8}
  }'

列出聲音

傳回您可以在 TTS/畫外音/有聲書通話中作為「語音」參數傳遞的每個語音。一旦克隆聲音的狀態處於活動狀態，也會回到此處。

GEThttps://api.airforce/v1/audio/voices

curl https://api.airforce/v1/audio/voices \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY"

回應結構

Parameter	Type	Required	Description
voices[]	array	Optional	List of voice descriptors.
voices[].voice_id	string	Optional	Provider-native voice identifier — the field is voice_id (not id). Pass this value as "voice".
voices[].name	string	Optional	Human-readable name.
voices[].description	string	Optional	Short description, when the upstream exposes one.
voices[].category	string	Optional	"premade" \| "cloned" \| "professional".
voices[].preview_url	string	Optional	Short audio sample, when the upstream exposes one.
voices[].labels	object	Optional	Free-form metadata: gender, language, accent, age, use case.
live	boolean	Optional	true when the catalog came from a live upstream call; false when served from the built-in premade fallback.

{
  "voices": [
    {
      "voice_id": "CwhRBWXzGAHq8TQ4Fs17",
      "name": "Roger - Laid-Back, Casual, Resonant",
      "description": "Easy going and perfect for casual conversations.",
      "preview_url": "https://.../58ee3ff5.mp3",
      "category": "premade",
      "labels": {"accent": "american", "gender": "male", "language": "en", "use_case": "conversational"}
    }
  ],
  "live": true
}

音樂世代

根據文字提示產生完整的音樂曲目。返回二進位音訊。

POSThttps://api.airforce/v1/audio/music

此 endpoint 服務原生音樂模型（例如 music-v1）。Suno 模型（suno-*）在此處無法使用，會回傳 provider_not_supported——請改為透過 /v1/images/generations endpoint 呼叫它們（請參閱 Media 參考文件）。

Parameter	Type	Required	Description
model	string	Required	Music model ID, e.g. "music-v1".
prompt	string	Required	Style / mood / structure description.
duration_seconds	integer	Optional	Track length. Range depends on the model (typically 15–120 s).
response_format	string	Optional	"mp3" (default) or provider-native.
instrumental	boolean	Optional	When true, suppresses vocals.
style	string	Optional	Optional genre tag list, e.g. "EDM, bass, dark".

curl https://api.airforce/v1/audio/music \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  --output track.mp3 \
  -d '{
    "model": "music-v1",
    "prompt": "Lofi hip-hop beat with soft piano and rain",
    "duration_seconds": 60,
    "instrumental": true
  }'

音效

文字提示中的簡短 SFX。與音樂的形狀相同，但持續時間較短。

POSThttps://api.airforce/v1/audio/sound-effects

Parameter	Type	Required	Description
model	string	Required	SFX model ID.
prompt	string	Required	Effect description, e.g. "thunder rumble fading into rain".
duration_seconds	integer	Optional	Length, typically 0.5–22 s.
response_format	string	Optional	"mp3" (default).

curl https://api.airforce/v1/audio/sound-effects \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  --output thunder.mp3 \
  -d '{
    "model": "sfx-v1",
    "prompt": "Distant thunder rolling, then rain",
    "duration_seconds": 8
  }'

轉錄（語音轉文字）

音訊檔案的分段上傳。返回轉錄的文字。

POSThttps://api.airforce/v1/audio/transcriptions

轉錄模型

…· live

Parameter	Type	Required	Description
model	string	Required	Transcription model ID. See the live list below for valid IDs.
file	binary	Required	Audio file. Supports mp3, wav, m4a, flac, ogg, webm.
language_code	string	Optional	ISO-639-1 language hint (also accepted as "language"). Auto-detected when omitted.
diarize	boolean	Optional	Separate speakers. When true, each word carries a speaker_id.
num_speakers	integer	Optional	Expected speaker count, used together with diarize.
tag_audio_events	boolean	Optional	Mark non-speech events (laughter, silence, music) in the output.
timestamps_granularity	string	Optional	"word" (default) or "character".
additional_formats	string	Optional	Request extra rendered outputs (e.g. srt / vtt) alongside the JSON.

curl https://api.airforce/v1/audio/transcriptions \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -F "[email protected]" \
  -F "model=elevenlabs-scribe" \
  -F "language_code=de" \
  -F "diarize=true"

回應結構

{
  "language_code": "deu",
  "language_probability": 0.98,
  "text": "Willkommen zum Meeting...",
  "words": [
    {"text": "Willkommen", "start": 0.0, "end": 0.62, "type": "word", "logprob": -0.08, "speaker_id": "speaker_0"},
    {"text": " ", "start": 0.62, "end": 0.62, "type": "spacing", "logprob": 0.0}
  ],
  "audio_duration_secs": 412.5,
  "transcription_id": "tx_01HXY..."
}

回應採用上游供應商的原生結構（ElevenLabs Scribe），而非 OpenAI Whisper 的結構：token 以扁平的 words[] 陣列回傳（每筆都帶有 word/spacing 的 type 以及一個 logprob），而非 segments[]。時長為 audio_duration_secs，而 language_code 為 ISO-639-3（例如 eng、deu）。每個字的 speaker_id 只有在您傳入 diarize=true 時才會出現。

音訊隔離

從剪輯中去除背景噪音，同時保留前景聲音。分段上傳，返回音訊。

POSThttps://api.airforce/v1/audio/audio-isolation

Parameter	Type	Required	Description
model	string	Required	Isolation model ID.
file	binary	Required	Input audio.

curl https://api.airforce/v1/audio/audio-isolation \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -F "model=isolation-v1" \
  -F "[email protected]" \
  --output clean.mp3

變聲器（語音轉語音）

取得輸入語音並以不同的聲音重新渲染它，同時保留時間和語調變化。

POSThttps://api.airforce/v1/audio/voice-changer

Parameter	Type	Required	Description
model	string	Required	Voice-change model ID.
voice	string	Required	Target voice ID. Same catalog as TTS.
file	binary	Required	Input audio.
voice_settings	object	Optional	Optional ElevenLabs-shape settings (stability, similarity_boost, …).

curl https://api.airforce/v1/audio/voice-changer \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -F "model=voice-changer-v1" \
  -F "voice=21m00Tcm4TlvDq8ikWAM" \
  -F "[email protected]" \
  --output transformed.mp3

配音

非同步配音至一種目標語言。立即回傳 dubbing_id；輪詢狀態直到其為 "dubbed"，然後下載該語言的配音音訊。

1. Create job

POSThttps://api.airforce/v1/audio/dubbing

Parameter	Type	Required	Description
model	string	Required	Dubbing model ID.
file	binary	Required	Source audio or video (mp3, wav, m4a, mp4 — audio is extracted automatically). Alternatively pass source_url.
target_lang	string	Required	Target language code (ISO-639-1). One language per job — repeating the field does not add languages.
source_lang	string	Optional	Source language. "auto" or omit for auto-detect.
num_speakers	integer	Optional	Hint for diarization. Auto when omitted.
drop_background_audio	boolean	Optional	Remove background music / noise from the dub.
watermark	boolean	Optional	Add an audible watermark to the output.

curl https://api.airforce/v1/audio/dubbing \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -F "model=dubbing-v1" \
  -F "[email protected]" \
  -F "target_lang=de" \
  -F "source_lang=en"

{
  "dubbing_id": "abc123def456",
  "expected_duration_sec": 42.5
}

2. Poll status

GEThttps://api.airforce/v1/audio/dubbing/:dubbing_id

狀態會原封不動地從供應商轉發。執行期間 status 為 "dubbing"，完成後則為 "dubbed"（而非 "completed"）。語言列於 target_languages 下（而非 available_languages），且沒有 progress 欄位。

{
  "dubbing_id": "abc123def456",
  "status": "dubbed",
  "source_language": "en",
  "target_languages": ["de"],
  "media_metadata": {"duration": 42.5, "content_type": "video/mp4"},
  "name": "english.mp4",
  "created_at": "2026-05-06T22:30:00Z",
  "editable": false,
  "error": null
}

3. Download per language

GEThttps://api.airforce/v1/audio/dubbing/:dubbing_id/audio/:lang

curl https://api.airforce/v1/audio/dubbing/abc123def456/audio/de \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  --output german.mp3

語音克隆

從短音訊樣本中克隆語音並在每個語音端點上重複使用它。語音克隆需要明確同意 - 獲取當前的同意文本，對其進行哈希處理，然後將哈希值與樣本一起提交。

1. Fetch consent text

GEThttps://api.airforce/v1/voices/consent-text

{
  "text": "I confirm that the voice samples I am uploading are either my own voice or a voice I have explicit permission to clone…",
  "hash": "9f4b0c8d2e…"
}

2. Create the clone

POSThttps://api.airforce/v1/voices/clone

Parameter	Type	Required	Description
name	string	Required	Public voice name shown in the library.
description	string	Optional	Optional free-text description.
consent_hash	string	Required	SHA-256 of the consent paragraph. Fetch the current text via GET /v1/voices/consent-text and pass its hash field.
files	binary	Required	1–25 audio samples. Repeat the form field per file. Total ≤ 200 MB. 30 s – 3 min per clip works best.

curl https://api.airforce/v1/voices/clone \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -F "name=My voice" \
  -F "description=Calm, conversational" \
  -F "consent_hash=9f4b0c8d2e..." \
  -F "[email protected]" \
  -F "[email protected]"

{
  "voice_id": "voice_01HXY...",
  "name": "My voice",
  "status": "active",
  "created_at": "2026-05-06T22:30:00Z"
}

關於欄位名稱請留意：建立的回應會以 voice_id 回傳新的語音，而 GET /v1/voices/library 則將複製的語音列於 provider_voice_id 下。兩者持有相同的識別碼——也就是您作為 voice 傳入的值。

3. List your library

GEThttps://api.airforce/v1/voices/library

curl https://api.airforce/v1/voices/library \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY"

Parameter	Type	Required	Description
voices[].provider_voice_id	string	Optional	Pass as "voice" on TTS / voice-changer endpoints.
voices[].status	string	Optional	"active" \| "errored" \| "deleting".
voices[].provider	string	Optional	Upstream that hosts the clone.
voices[].last_error	string	Optional	Set when status is "errored".

4. Update / delete

PATCHhttps://api.airforce/v1/voices/clone/:id

DELETEhttps://api.airforce/v1/voices/clone/:id

PATCH accepts name and description in a JSON body. DELETE removes the voice both locally and at the upstream provider.

筆記

音訊回應以原始位元組（raw bytes）回傳，並附帶正確的 Content-Type。PCM／µ-law 格式會包覆一個最簡化的 WAV header，以便可直接在瀏覽器中播放。
Multipart endpoint（transcriptions、isolation、voice-changer、dubbing、cloning）每次請求最多接受 200 MB。
Voice ID 可跨供應商使用：複製的 ElevenLabs 語音可直接傳入 /v1/audio/voice-changer。
費用按字元計（TTS）、按秒計（music／SFX／dubbing／voice-changer）或按音訊分鐘計（transcription），並從您的餘額中扣除。音訊 endpoint 不會傳送 X-Cost-Cents 回應 header——請在您的儀表板使用量記錄中追蹤花費。