API REFERENCE

オーディオ

テキスト読み上げ、音声合成、音楽、効果音、音声変更、吹き替え、および音声クローン作成 - プロバイダーごとに 1 つの API キーで。

1 つの audio surface が text-to-speech、transcription、music、sound effects、dubbing、voice changing、voice cloning をカバーします。コアの endpoints は OpenAI 互換で、より高度な追加機能 — voice settings、speaker diarization、dubbing — は upstream プロバイダーが対応している箇所で受け付けられます。

まず利用可能な voices を一覧してください。作成した cloned voices も同じ一覧に表示され、同じように使えます。

Endpoints in this section: /v1/audio/speech, /music, /sound-effects, /transcriptions, /audio-isolation, /voice-changer, /dubbing, /voices, plus /v1/voices/* for cloning.

テキスト読み上げ

テキストから音声を合成します。一致する Content-Type (audio/mpeg など) を持つ生のオーディオバイトを返します。 PCM および µ-law 形式には WAV ヘッダーが含まれているため、どのブラウザでも再生できます。

POSThttps://api.airforce/v1/audio/speech

TTSモデル

…· live

Parameter	Type	Required	Description
model	string	Required	TTS model ID. See /v1/models for IDs with input_modalities containing "text" and output_modalities containing "audio".
input	string	Required	Text to synthesise. Long inputs are chunked automatically.
voice	string	Required	Voice ID. Use GET /v1/audio/voices to list options. Cloned voices appear here too.
response_format	string	Optional	"mp3" (default), "mp3_44100_128", "mp3_44100_192", "pcm_22050", "pcm_24000", "pcm_44100", "ulaw_8000".
speed	float	Optional	0.25 – 4.0. OpenAI-compatible. Some upstream providers ignore this.
voice_settings	object	Optional	ElevenLabs-shape: { stability: 0–1, similarity_boost: 0–1, style: 0–1, use_speaker_boost: bool }.
language_code	string	Optional	ISO-639-1 hint, e.g. "de", "en", "ja". Improves prosody for multilingual models.
seed	integer	Optional	Reproducibility seed where supported.

例

curl https://api.airforce/v1/audio/speech \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  --output speech.mp3 \
  -d '{
    "model": "elevenlabs-multilingual-v2",
    "input": "Willkommen bei Airforce.",
    "voice": "21m00Tcm4TlvDq8ikWAM",
    "response_format": "mp3_44100_128",
    "voice_settings": {"stability": 0.6, "similarity_boost": 0.8}
  }'

ボイスをリストする

TTS / ナレーション / オーディオブック通話で「音声」パラメータとして渡すことができるすべての音声を返します。クローン化された音声も、ステータスがアクティブになるとここに返されます。

GEThttps://api.airforce/v1/audio/voices

curl https://api.airforce/v1/audio/voices \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY"

レスポンス形式

Parameter	Type	Required	Description
voices[]	array	Optional	List of voice descriptors.
voices[].voice_id	string	Optional	Provider-native voice identifier — the field is voice_id (not id). Pass this value as "voice".
voices[].name	string	Optional	Human-readable name.
voices[].description	string	Optional	Short description, when the upstream exposes one.
voices[].category	string	Optional	"premade" \| "cloned" \| "professional".
voices[].preview_url	string	Optional	Short audio sample, when the upstream exposes one.
voices[].labels	object	Optional	Free-form metadata: gender, language, accent, age, use case.
live	boolean	Optional	true when the catalog came from a live upstream call; false when served from the built-in premade fallback.

{
  "voices": [
    {
      "voice_id": "CwhRBWXzGAHq8TQ4Fs17",
      "name": "Roger - Laid-Back, Casual, Resonant",
      "description": "Easy going and perfect for casual conversations.",
      "preview_url": "https://.../58ee3ff5.mp3",
      "category": "premade",
      "labels": {"accent": "american", "gender": "male", "language": "en", "use_case": "conversational"}
    }
  ],
  "live": true
}

音楽の生成

テキストプロンプトから完全な音楽トラックを生成します。バイナリオーディオを返します。

POSThttps://api.airforce/v1/audio/music

この endpoint はネイティブの音楽モデル（例: music-v1）を提供します。Suno モデル（suno-*）はここでは利用できず、provider_not_supported を返します。代わりに /v1/images/generations endpoint 経由で呼び出してください（Media リファレンスを参照）。

Parameter	Type	Required	Description
model	string	Required	Music model ID, e.g. "music-v1".
prompt	string	Required	Style / mood / structure description.
duration_seconds	integer	Optional	Track length. Range depends on the model (typically 15–120 s).
response_format	string	Optional	"mp3" (default) or provider-native.
instrumental	boolean	Optional	When true, suppresses vocals.
style	string	Optional	Optional genre tag list, e.g. "EDM, bass, dark".

curl https://api.airforce/v1/audio/music \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  --output track.mp3 \
  -d '{
    "model": "music-v1",
    "prompt": "Lofi hip-hop beat with soft piano and rain",
    "duration_seconds": 60,
    "instrumental": true
  }'

効果音

テキストプロンプトからの短い SFX。音楽と同じ形ですが、持続時間が短いだけです。

POSThttps://api.airforce/v1/audio/sound-effects

Parameter	Type	Required	Description
model	string	Required	SFX model ID.
prompt	string	Required	Effect description, e.g. "thunder rumble fading into rain".
duration_seconds	integer	Optional	Length, typically 0.5–22 s.
response_format	string	Optional	"mp3" (default).

curl https://api.airforce/v1/audio/sound-effects \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  --output thunder.mp3 \
  -d '{
    "model": "sfx-v1",
    "prompt": "Distant thunder rolling, then rain",
    "duration_seconds": 8
  }'

文字起こし（音声からテキストへ）

音声ファイルのマルチパートアップロード。転写されたテキストを返します。

POSThttps://api.airforce/v1/audio/transcriptions

転写モデル

…· live

Parameter	Type	Required	Description
model	string	Required	Transcription model ID. See the live list below for valid IDs.
file	binary	Required	Audio file. Supports mp3, wav, m4a, flac, ogg, webm.
language_code	string	Optional	ISO-639-1 language hint (also accepted as "language"). Auto-detected when omitted.
diarize	boolean	Optional	Separate speakers. When true, each word carries a speaker_id.
num_speakers	integer	Optional	Expected speaker count, used together with diarize.
tag_audio_events	boolean	Optional	Mark non-speech events (laughter, silence, music) in the output.
timestamps_granularity	string	Optional	"word" (default) or "character".
additional_formats	string	Optional	Request extra rendered outputs (e.g. srt / vtt) alongside the JSON.

curl https://api.airforce/v1/audio/transcriptions \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -F "[email protected]" \
  -F "model=elevenlabs-scribe" \
  -F "language_code=de" \
  -F "diarize=true"

レスポンス形式

{
  "language_code": "deu",
  "language_probability": 0.98,
  "text": "Willkommen zum Meeting...",
  "words": [
    {"text": "Willkommen", "start": 0.0, "end": 0.62, "type": "word", "logprob": -0.08, "speaker_id": "speaker_0"},
    {"text": " ", "start": 0.62, "end": 0.62, "type": "spacing", "logprob": 0.0}
  ],
  "audio_duration_secs": 412.5,
  "transcription_id": "tx_01HXY..."
}

レスポンスは OpenAI Whisper ではなく、上流プロバイダーのネイティブ形式（ElevenLabs Scribe）に従います。token は segments[] ではなく、フラットな words[] 配列として返されます（各要素は word/spacing の type と logprob を持ちます）。長さは audio_duration_secs、language_code は ISO-639-3（例: eng、deu）です。単語ごとの speaker_id は diarize=true を渡した場合にのみ存在します。

オーディオの分離

前景の音声を維持しながら、クリップから背景ノイズを取り除きます。マルチパートアップロード、オーディオを返します。

POSThttps://api.airforce/v1/audio/audio-isolation

Parameter	Type	Required	Description
model	string	Required	Isolation model ID.
file	binary	Required	Input audio.

curl https://api.airforce/v1/audio/audio-isolation \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -F "model=isolation-v1" \
  -F "[email protected]" \
  --output clean.mp3

ボイスチェンジャー（音声合成）

入力音声を取得し、タイミングと抑揚を維持しながら別の音声で再レンダリングします。

POSThttps://api.airforce/v1/audio/voice-changer

Parameter	Type	Required	Description
model	string	Required	Voice-change model ID.
voice	string	Required	Target voice ID. Same catalog as TTS.
file	binary	Required	Input audio.
voice_settings	object	Optional	Optional ElevenLabs-shape settings (stability, similarity_boost, …).

curl https://api.airforce/v1/audio/voice-changer \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -F "model=voice-changer-v1" \
  -F "voice=21m00Tcm4TlvDq8ikWAM" \
  -F "[email protected]" \
  --output transformed.mp3

ダビング

1つのターゲット言語への非同期ダビング。すぐに dubbing_id を返します。ステータスが "dubbed" になるまでポーリングし、その言語のダビング済み音声をダウンロードします。

1. Create job

POSThttps://api.airforce/v1/audio/dubbing

Parameter	Type	Required	Description
model	string	Required	Dubbing model ID.
file	binary	Required	Source audio or video (mp3, wav, m4a, mp4 — audio is extracted automatically). Alternatively pass source_url.
target_lang	string	Required	Target language code (ISO-639-1). One language per job — repeating the field does not add languages.
source_lang	string	Optional	Source language. "auto" or omit for auto-detect.
num_speakers	integer	Optional	Hint for diarization. Auto when omitted.
drop_background_audio	boolean	Optional	Remove background music / noise from the dub.
watermark	boolean	Optional	Add an audible watermark to the output.

curl https://api.airforce/v1/audio/dubbing \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -F "model=dubbing-v1" \
  -F "[email protected]" \
  -F "target_lang=de" \
  -F "source_lang=en"

{
  "dubbing_id": "abc123def456",
  "expected_duration_sec": 42.5
}

2. Poll status

GEThttps://api.airforce/v1/audio/dubbing/:dubbing_id

ステータスはプロバイダーからそのまま転送されます。status は実行中は "dubbing"、完了時は（"completed" ではなく）"dubbed" になります。言語は（available_languages ではなく）target_languages の下にあり、progress フィールドはありません。

{
  "dubbing_id": "abc123def456",
  "status": "dubbed",
  "source_language": "en",
  "target_languages": ["de"],
  "media_metadata": {"duration": 42.5, "content_type": "video/mp4"},
  "name": "english.mp4",
  "created_at": "2026-05-06T22:30:00Z",
  "editable": false,
  "error": null
}

3. Download per language

GEThttps://api.airforce/v1/audio/dubbing/:dubbing_id/audio/:lang

curl https://api.airforce/v1/audio/dubbing/abc123def456/audio/de \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  --output german.mp3

音声クローン

短いオーディオサンプルから音声のクローンを作成し、それをすべての音声エンドポイントで再利用します。音声のクローン作成には明示的な同意が必要です。現在の同意テキストを取得してハッシュし、サンプルと一緒にハッシュを送信します。

1. Fetch consent text

GEThttps://api.airforce/v1/voices/consent-text

{
  "text": "I confirm that the voice samples I am uploading are either my own voice or a voice I have explicit permission to clone…",
  "hash": "9f4b0c8d2e…"
}

2. Create the clone

POSThttps://api.airforce/v1/voices/clone

Parameter	Type	Required	Description
name	string	Required	Public voice name shown in the library.
description	string	Optional	Optional free-text description.
consent_hash	string	Required	SHA-256 of the consent paragraph. Fetch the current text via GET /v1/voices/consent-text and pass its hash field.
files	binary	Required	1–25 audio samples. Repeat the form field per file. Total ≤ 200 MB. 30 s – 3 min per clip works best.

curl https://api.airforce/v1/voices/clone \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -F "name=My voice" \
  -F "description=Calm, conversational" \
  -F "consent_hash=9f4b0c8d2e..." \
  -F "[email protected]" \
  -F "[email protected]"

{
  "voice_id": "voice_01HXY...",
  "name": "My voice",
  "status": "active",
  "created_at": "2026-05-06T22:30:00Z"
}

フィールド名に関する注意: 作成レスポンスは新しいボイスを voice_id として返しますが、GET /v1/voices/library はクローンを provider_voice_id の下に一覧します。どちらも同じ識別子を保持しており、これが voice として渡す値です。

3. List your library

GEThttps://api.airforce/v1/voices/library

curl https://api.airforce/v1/voices/library \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY"

Parameter	Type	Required	Description
voices[].provider_voice_id	string	Optional	Pass as "voice" on TTS / voice-changer endpoints.
voices[].status	string	Optional	"active" \| "errored" \| "deleting".
voices[].provider	string	Optional	Upstream that hosts the clone.
voices[].last_error	string	Optional	Set when status is "errored".

4. Update / delete

PATCHhttps://api.airforce/v1/voices/clone/:id

DELETEhttps://api.airforce/v1/voices/clone/:id

PATCH accepts name and description in a JSON body. DELETE removes the voice both locally and at the upstream provider.

注意事項

オーディオレスポンスは適切な Content-Type を伴う生のバイト列として返されます。PCM / µ-law 形式は、そのままブラウザで再生できるよう最小限の WAV ヘッダーでラップされます。
Multipart endpoint（transcriptions、isolation、voice-changer、dubbing、cloning）はリクエストあたり最大 200 MB を受け付けます。
Voice ID はプロバイダー横断で機能します。クローンした ElevenLabs ボイスは、そのまま /v1/audio/voice-changer に渡せます。
コストは文字単位（TTS）、秒単位（music / SFX / dubbing / voice-changer）、またはオーディオ分単位（transcription）で計測され、残高から差し引かれます。オーディオ endpoint は X-Cost-Cents レスポンス header を送信しません。利用状況はダッシュボードの使用ログで追跡してください。