API REFERENCE

오디오

텍스트 음성 변환, 음성 텍스트 변환, 음악, 음향 효과, 음성 변경, 더빙 및 음성 복제 — 모든 공급자에 하나의 API 키가 제공됩니다.

단일 audio surface가 text-to-speech, transcription, 음악, 음향 효과, dubbing, 음성 변환, 음성 복제를 모두 다룹니다. 핵심 endpoint는 OpenAI 호환이며, 더 풍부한 부가 기능 — voice settings, speaker diarization, dubbing — 은 upstream provider가 지원하는 곳이라면 어디서든 허용됩니다.

먼저 사용 가능한 voice 목록을 조회하세요; 직접 만든 복제 voice도 같은 목록에 나타나며 동일한 방식으로 사용됩니다.

Endpoints in this section: /v1/audio/speech, /music, /sound-effects, /transcriptions, /audio-isolation, /voice-changer, /dubbing, /voices, plus /v1/voices/* for cloning.

텍스트 음성 변환

텍스트에서 음성을 합성합니다. 일치하는 Content-Type(예: audio/mpeg)을 사용하여 원시 오디오 바이트를 반환합니다. PCM 및 µ-law 형식에는 WAV 헤더가 포함되어 있어 모든 브라우저에서 재생할 수 있습니다.

POSThttps://api.airforce/v1/audio/speech

TTS 모델

…· live

Parameter	Type	Required	Description
model	string	Required	TTS model ID. See /v1/models for IDs with input_modalities containing "text" and output_modalities containing "audio".
input	string	Required	Text to synthesise. Long inputs are chunked automatically.
voice	string	Required	Voice ID. Use GET /v1/audio/voices to list options. Cloned voices appear here too.
response_format	string	Optional	"mp3" (default), "mp3_44100_128", "mp3_44100_192", "pcm_22050", "pcm_24000", "pcm_44100", "ulaw_8000".
speed	float	Optional	0.25 – 4.0. OpenAI-compatible. Some upstream providers ignore this.
voice_settings	object	Optional	ElevenLabs-shape: { stability: 0–1, similarity_boost: 0–1, style: 0–1, use_speaker_boost: bool }.
language_code	string	Optional	ISO-639-1 hint, e.g. "de", "en", "ja". Improves prosody for multilingual models.
seed	integer	Optional	Reproducibility seed where supported.

예

curl https://api.airforce/v1/audio/speech \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  --output speech.mp3 \
  -d '{
    "model": "elevenlabs-multilingual-v2",
    "input": "Willkommen bei Airforce.",
    "voice": "21m00Tcm4TlvDq8ikWAM",
    "response_format": "mp3_44100_128",
    "voice_settings": {"stability": 0.6, "similarity_boost": 0.8}
  }'

음성 나열

TTS/음성 해설/오디오북 통화에서 "voice" 매개변수로 전달할 수 있는 모든 음성을 반환합니다. 상태가 활성화되면 복제된 음성도 여기에 반환됩니다.

GEThttps://api.airforce/v1/audio/voices

curl https://api.airforce/v1/audio/voices \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY"

응답 형식

Parameter	Type	Required	Description
voices[]	array	Optional	List of voice descriptors.
voices[].voice_id	string	Optional	Provider-native voice identifier — the field is voice_id (not id). Pass this value as "voice".
voices[].name	string	Optional	Human-readable name.
voices[].description	string	Optional	Short description, when the upstream exposes one.
voices[].category	string	Optional	"premade" \| "cloned" \| "professional".
voices[].preview_url	string	Optional	Short audio sample, when the upstream exposes one.
voices[].labels	object	Optional	Free-form metadata: gender, language, accent, age, use case.
live	boolean	Optional	true when the catalog came from a live upstream call; false when served from the built-in premade fallback.

{
  "voices": [
    {
      "voice_id": "CwhRBWXzGAHq8TQ4Fs17",
      "name": "Roger - Laid-Back, Casual, Resonant",
      "description": "Easy going and perfect for casual conversations.",
      "preview_url": "https://.../58ee3ff5.mp3",
      "category": "premade",
      "labels": {"accent": "american", "gender": "male", "language": "en", "use_case": "conversational"}
    }
  ],
  "live": true
}

음악 생성

텍스트 프롬프트에서 전체 음악 트랙을 생성합니다. 바이너리 오디오를 반환합니다.

POSThttps://api.airforce/v1/audio/music

이 endpoint는 네이티브 음악 모델(예: music-v1)을 제공합니다. Suno 모델(suno-*)은 여기서 사용할 수 없으며 provider_not_supported를 반환합니다 — 대신 /v1/images/generations endpoint를 통해 호출하세요(Media 레퍼런스 참조).

Parameter	Type	Required	Description
model	string	Required	Music model ID, e.g. "music-v1".
prompt	string	Required	Style / mood / structure description.
duration_seconds	integer	Optional	Track length. Range depends on the model (typically 15–120 s).
response_format	string	Optional	"mp3" (default) or provider-native.
instrumental	boolean	Optional	When true, suppresses vocals.
style	string	Optional	Optional genre tag list, e.g. "EDM, bass, dark".

curl https://api.airforce/v1/audio/music \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  --output track.mp3 \
  -d '{
    "model": "music-v1",
    "prompt": "Lofi hip-hop beat with soft piano and rain",
    "duration_seconds": 60,
    "instrumental": true
  }'

음향 효과

텍스트 프롬프트의 짧은 SFX. 음악과 모양은 같지만 지속 시간은 더 짧습니다.

POSThttps://api.airforce/v1/audio/sound-effects

Parameter	Type	Required	Description
model	string	Required	SFX model ID.
prompt	string	Required	Effect description, e.g. "thunder rumble fading into rain".
duration_seconds	integer	Optional	Length, typically 0.5–22 s.
response_format	string	Optional	"mp3" (default).

curl https://api.airforce/v1/audio/sound-effects \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  --output thunder.mp3 \
  -d '{
    "model": "sfx-v1",
    "prompt": "Distant thunder rolling, then rain",
    "duration_seconds": 8
  }'

전사(음성-텍스트)

오디오 파일의 멀티파트 업로드. 복사된 텍스트를 반환합니다.

POSThttps://api.airforce/v1/audio/transcriptions

전사 모델

…· live

Parameter	Type	Required	Description
model	string	Required	Transcription model ID. See the live list below for valid IDs.
file	binary	Required	Audio file. Supports mp3, wav, m4a, flac, ogg, webm.
language_code	string	Optional	ISO-639-1 language hint (also accepted as "language"). Auto-detected when omitted.
diarize	boolean	Optional	Separate speakers. When true, each word carries a speaker_id.
num_speakers	integer	Optional	Expected speaker count, used together with diarize.
tag_audio_events	boolean	Optional	Mark non-speech events (laughter, silence, music) in the output.
timestamps_granularity	string	Optional	"word" (default) or "character".
additional_formats	string	Optional	Request extra rendered outputs (e.g. srt / vtt) alongside the JSON.

curl https://api.airforce/v1/audio/transcriptions \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -F "[email protected]" \
  -F "model=elevenlabs-scribe" \
  -F "language_code=de" \
  -F "diarize=true"

응답 형식

{
  "language_code": "deu",
  "language_probability": 0.98,
  "text": "Willkommen zum Meeting...",
  "words": [
    {"text": "Willkommen", "start": 0.0, "end": 0.62, "type": "word", "logprob": -0.08, "speaker_id": "speaker_0"},
    {"text": " ", "start": 0.62, "end": 0.62, "type": "spacing", "logprob": 0.0}
  ],
  "audio_duration_secs": 412.5,
  "transcription_id": "tx_01HXY..."
}

응답은 OpenAI Whisper가 아니라 업스트림 제공자의 네이티브 형식(ElevenLabs Scribe)을 따릅니다: token은 segments[]가 아니라 평평한 words[] 배열로 반환되며(각각 word/spacing의 type과 logprob을 가짐), duration은 audio_duration_secs이고 language_code는 ISO-639-3입니다(예: eng, deu). 단어별 speaker_id는 diarize=true를 전달한 경우에만 포함됩니다.

오디오 격리

전경 음성을 유지하면서 클립에서 배경 잡음을 제거합니다. 멀티파트 업로드, 오디오 반환.

POSThttps://api.airforce/v1/audio/audio-isolation

Parameter	Type	Required	Description
model	string	Required	Isolation model ID.
file	binary	Required	Input audio.

curl https://api.airforce/v1/audio/audio-isolation \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -F "model=isolation-v1" \
  -F "[email protected]" \
  --output clean.mp3

음성 체인저(음성 대 음성)

타이밍과 억양을 유지하면서 입력 음성을 가져와 다른 음성으로 다시 렌더링합니다.

POSThttps://api.airforce/v1/audio/voice-changer

Parameter	Type	Required	Description
model	string	Required	Voice-change model ID.
voice	string	Required	Target voice ID. Same catalog as TTS.
file	binary	Required	Input audio.
voice_settings	object	Optional	Optional ElevenLabs-shape settings (stability, similarity_boost, …).

curl https://api.airforce/v1/audio/voice-changer \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -F "model=voice-changer-v1" \
  -F "voice=21m00Tcm4TlvDq8ikWAM" \
  -F "[email protected]" \
  --output transformed.mp3

더빙

하나의 대상 언어로의 비동기 더빙. 즉시 dubbing_id를 반환합니다. 상태가 "dubbed"가 될 때까지 폴링한 다음 해당 언어의 더빙된 오디오를 다운로드하세요.

1. Create job

POSThttps://api.airforce/v1/audio/dubbing

Parameter	Type	Required	Description
model	string	Required	Dubbing model ID.
file	binary	Required	Source audio or video (mp3, wav, m4a, mp4 — audio is extracted automatically). Alternatively pass source_url.
target_lang	string	Required	Target language code (ISO-639-1). One language per job — repeating the field does not add languages.
source_lang	string	Optional	Source language. "auto" or omit for auto-detect.
num_speakers	integer	Optional	Hint for diarization. Auto when omitted.
drop_background_audio	boolean	Optional	Remove background music / noise from the dub.
watermark	boolean	Optional	Add an audible watermark to the output.

curl https://api.airforce/v1/audio/dubbing \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -F "model=dubbing-v1" \
  -F "[email protected]" \
  -F "target_lang=de" \
  -F "source_lang=en"

{
  "dubbing_id": "abc123def456",
  "expected_duration_sec": 42.5
}

2. Poll status

GEThttps://api.airforce/v1/audio/dubbing/:dubbing_id

상태는 제공자로부터 그대로 전달됩니다. status는 진행 중일 때 "dubbing"으로, 완료되면 "dubbed"로 표시됩니다("completed"가 아님). 언어는 available_languages가 아니라 target_languages 아래에 있으며, progress 필드는 없습니다.

{
  "dubbing_id": "abc123def456",
  "status": "dubbed",
  "source_language": "en",
  "target_languages": ["de"],
  "media_metadata": {"duration": 42.5, "content_type": "video/mp4"},
  "name": "english.mp4",
  "created_at": "2026-05-06T22:30:00Z",
  "editable": false,
  "error": null
}

3. Download per language

GEThttps://api.airforce/v1/audio/dubbing/:dubbing_id/audio/:lang

curl https://api.airforce/v1/audio/dubbing/abc123def456/audio/de \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  --output german.mp3

음성 복제

짧은 오디오 샘플에서 음성을 복제하고 모든 음성 끝점에서 재사용합니다. 음성 복제에는 명시적인 동의가 필요합니다. 현재 동의 텍스트를 가져와서 해시하고 샘플과 함께 해시를 제출하세요.

1. Fetch consent text

GEThttps://api.airforce/v1/voices/consent-text

{
  "text": "I confirm that the voice samples I am uploading are either my own voice or a voice I have explicit permission to clone…",
  "hash": "9f4b0c8d2e…"
}

2. Create the clone

POSThttps://api.airforce/v1/voices/clone

Parameter	Type	Required	Description
name	string	Required	Public voice name shown in the library.
description	string	Optional	Optional free-text description.
consent_hash	string	Required	SHA-256 of the consent paragraph. Fetch the current text via GET /v1/voices/consent-text and pass its hash field.
files	binary	Required	1–25 audio samples. Repeat the form field per file. Total ≤ 200 MB. 30 s – 3 min per clip works best.

curl https://api.airforce/v1/voices/clone \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -F "name=My voice" \
  -F "description=Calm, conversational" \
  -F "consent_hash=9f4b0c8d2e..." \
  -F "[email protected]" \
  -F "[email protected]"

{
  "voice_id": "voice_01HXY...",
  "name": "My voice",
  "status": "active",
  "created_at": "2026-05-06T22:30:00Z"
}

필드 이름에 주의하세요: 생성 응답은 새 음성을 voice_id로 반환하는 반면, GET /v1/voices/library는 클론을 provider_voice_id 아래에 나열합니다. 둘 다 동일한 식별자를 담고 있으며 — voice로 전달하는 값입니다.

3. List your library

GEThttps://api.airforce/v1/voices/library

curl https://api.airforce/v1/voices/library \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY"

Parameter	Type	Required	Description
voices[].provider_voice_id	string	Optional	Pass as "voice" on TTS / voice-changer endpoints.
voices[].status	string	Optional	"active" \| "errored" \| "deleting".
voices[].provider	string	Optional	Upstream that hosts the clone.
voices[].last_error	string	Optional	Set when status is "errored".

4. Update / delete

PATCHhttps://api.airforce/v1/voices/clone/:id

DELETEhttps://api.airforce/v1/voices/clone/:id

PATCH accepts name and description in a JSON body. DELETE removes the voice both locally and at the upstream provider.

메모

오디오 응답은 올바른 Content-Type과 함께 raw 바이트로 반환됩니다. PCM / µ-law 형식은 브라우저에서 그대로 재생할 수 있도록 최소한의 WAV header로 래핑됩니다.
Multipart endpoint(transcriptions, isolation, voice-changer, dubbing, cloning)는 요청당 최대 200 MB를 허용합니다.
음성 ID는 제공자 간에 동작합니다: 클론된 ElevenLabs 음성을 /v1/audio/voice-changer에 곧바로 전달할 수 있습니다.
비용은 문자당(TTS), 초당(music / SFX / dubbing / voice-changer) 또는 오디오 분당(transcription)으로 과금되며 잔액에서 차감됩니다. 오디오 endpoint는 X-Cost-Cents 응답 header를 보내지 않으니 — 대시보드 사용량 로그에서 지출을 추적하세요.