Api.Airforce
API REFERENCE

오디오

텍스트 음성 변환, 음성 텍스트 변환, 음악, 음향 효과, 음성 변경, 더빙 및 음성 복제 — 모든 공급자에 하나의 API 키가 제공됩니다.

Endpoints in this section: /v1/audio/speech, /music, /sound-effects, /transcriptions, /audio-isolation, /voice-changer, /dubbing, /voices, plus /v1/voices/* for cloning.

텍스트 음성 변환

텍스트에서 음성을 합성합니다. 일치하는 Content-Type(예: audio/mpeg)을 사용하여 원시 오디오 바이트를 반환합니다. PCM 및 µ-law 형식에는 WAV 헤더가 포함되어 있어 모든 브라우저에서 재생할 수 있습니다.

POSThttps://api.airforce/v1/audio/speech

TTS 모델

· live
ParameterTypeRequiredDescription
modelstringRequiredTTS model ID. See /v1/models for IDs with input_modalities containing "text" and output_modalities containing "audio".
inputstringRequiredText to synthesise. Long inputs are chunked automatically.
voicestringRequiredVoice ID. Use GET /v1/audio/voices to list options. Cloned voices appear here too.
response_formatstringOptional"mp3" (default), "mp3_44100_128", "mp3_44100_192", "pcm_22050", "pcm_24000", "pcm_44100", "ulaw_8000".
speedfloatOptional0.25 – 4.0. OpenAI-compatible. Some upstream providers ignore this.
voice_settingsobjectOptionalElevenLabs-shape: { stability: 0–1, similarity_boost: 0–1, style: 0–1, use_speaker_boost: bool }.
language_codestringOptionalISO-639-1 hint, e.g. "de", "en", "ja". Improves prosody for multilingual models.
seedintegerOptionalReproducibility seed where supported.

curl https://api.airforce/v1/audio/speech \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  --output speech.mp3 \
  -d '{
    "model": "elevenlabs-multilingual-v2",
    "input": "Willkommen bei Airforce.",
    "voice": "21m00Tcm4TlvDq8ikWAM",
    "response_format": "mp3_44100_128",
    "voice_settings": {"stability": 0.6, "similarity_boost": 0.8}
  }'

음성 나열

TTS/음성 해설/오디오북 통화에서 "voice" 매개변수로 전달할 수 있는 모든 음성을 반환합니다. 상태가 활성화되면 복제된 음성도 여기에 반환됩니다.

GEThttps://api.airforce/v1/audio/voices
curl https://api.airforce/v1/audio/voices \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY"

응답 형식

ParameterTypeRequiredDescription
voices[]arrayOptionalList of voice descriptors.
voices[].idstringOptionalProvider-native voice identifier. Pass this as "voice".
voices[].namestringOptionalHuman-readable name.
voices[].categorystringOptional"premade" | "cloned" | "professional".
voices[].preview_urlstringOptionalShort audio sample, when the upstream exposes one.
voices[].labelsobjectOptionalFree-form metadata: gender, language, accent, age, use case.

음악 세대

텍스트 프롬프트에서 전체 음악 트랙을 생성합니다. 바이너리 오디오를 반환합니다.

POSThttps://api.airforce/v1/audio/music
ParameterTypeRequiredDescription
modelstringRequiredMusic model ID, e.g. "music-v1".
promptstringRequiredStyle / mood / structure description.
duration_secondsintegerOptionalTrack length. Range depends on the model (typically 15–120 s).
response_formatstringOptional"mp3" (default) or provider-native.
instrumentalbooleanOptionalWhen true, suppresses vocals.
stylestringOptionalOptional genre tag list, e.g. "EDM, bass, dark".
curl https://api.airforce/v1/audio/music \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  --output track.mp3 \
  -d '{
    "model": "music-v1",
    "prompt": "Lofi hip-hop beat with soft piano and rain",
    "duration_seconds": 60,
    "instrumental": true
  }'

음향 효과

텍스트 프롬프트의 짧은 SFX. 음악과 모양은 같지만 지속 시간은 더 짧습니다.

POSThttps://api.airforce/v1/audio/sound-effects
ParameterTypeRequiredDescription
modelstringRequiredSFX model ID.
promptstringRequiredEffect description, e.g. "thunder rumble fading into rain".
duration_secondsintegerOptionalLength, typically 0.5–22 s.
response_formatstringOptional"mp3" (default).
curl https://api.airforce/v1/audio/sound-effects \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  --output thunder.mp3 \
  -d '{
    "model": "sfx-v1",
    "prompt": "Distant thunder rolling, then rain",
    "duration_seconds": 8
  }'

전사(음성-텍스트)

오디오 파일의 멀티파트 업로드. 복사된 텍스트를 반환합니다.

POSThttps://api.airforce/v1/audio/transcriptions

전사 모델

· live
ParameterTypeRequiredDescription
modelstringRequiredTranscription model ID, e.g. "whisper-1".
filebinaryRequiredAudio file. Supports mp3, wav, m4a, flac, ogg, webm.
languagestringOptionalISO-639-1 language hint.
promptstringOptionalOptional bias prompt to steer vocabulary (jargon, names).
response_formatstringOptional"json" (default), "text", "srt", "vtt", "verbose_json".
temperaturefloatOptional0–1. Higher allows more creative interpretation of unclear audio.
curl https://api.airforce/v1/audio/transcriptions \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -F "[email protected]" \
  -F "model=whisper-1" \
  -F "language=de" \
  -F "response_format=verbose_json"

Response (verbose_json)

{
  "text": "Willkommen zum Meeting...",
  "language": "de",
  "duration": 412.5,
  "segments": [
    {"id": 0, "start": 0.0, "end": 4.2, "text": "Willkommen zum Meeting", "speaker": "S1"},
    {"id": 1, "start": 4.2, "end": 9.8, "text": "...", "speaker": "S2"}
  ]
}

오디오 격리

전경 음성을 유지하면서 클립에서 배경 잡음을 제거합니다. 멀티파트 업로드, 오디오 반환.

POSThttps://api.airforce/v1/audio/audio-isolation
ParameterTypeRequiredDescription
modelstringRequiredIsolation model ID.
filebinaryRequiredInput audio.
curl https://api.airforce/v1/audio/audio-isolation \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -F "model=isolation-v1" \
  -F "[email protected]" \
  --output clean.mp3

음성 체인저(음성 대 음성)

타이밍과 억양을 유지하면서 입력 음성을 가져와 다른 음성으로 다시 렌더링합니다.

POSThttps://api.airforce/v1/audio/voice-changer
ParameterTypeRequiredDescription
modelstringRequiredVoice-change model ID.
voicestringRequiredTarget voice ID. Same catalog as TTS.
filebinaryRequiredInput audio.
voice_settingsobjectOptionalOptional ElevenLabs-shape settings (stability, similarity_boost, …).
curl https://api.airforce/v1/audio/voice-changer \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -F "model=voice-changer-v1" \
  -F "voice=21m00Tcm4TlvDq8ikWAM" \
  -F "[email protected]" \
  --output transformed.mp3

더빙

비동기 다중 언어 더빙. 작업 ID를 즉시 반환합니다. 상태를 폴링한 다음 대상 언어별로 더빙된 오디오를 가져옵니다.

1. Create job

POSThttps://api.airforce/v1/audio/dubbing
ParameterTypeRequiredDescription
modelstringRequiredDubbing model ID.
filebinaryRequiredSource audio (mp3, wav, m4a, mp4 video extracted automatically).
target_langstringRequiredTarget language code, ISO-639-1. Pass multiple via repeated form fields for multi-language output.
source_langstringOptionalSource language. Auto-detected when omitted.
num_speakersintegerOptionalHint for diarization. Auto when omitted.
watermarkbooleanOptionalAdd an audible watermark to the output.
curl https://api.airforce/v1/audio/dubbing \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -F "model=dubbing-v1" \
  -F "[email protected]" \
  -F "target_lang=de" \
  -F "target_lang=es" \
  -F "source_lang=en"
{
  "task_id": "dub_01HXY...",
  "status": "queued"
}

2. Poll status

GEThttps://api.airforce/v1/audio/dubbing/:task_id
{
  "task_id": "dub_01HXY...",
  "status": "completed",
  "progress": 100,
  "available_languages": ["de", "es"]
}

3. Download per language

GEThttps://api.airforce/v1/audio/dubbing/:task_id/audio/:lang
curl https://api.airforce/v1/audio/dubbing/dub_01HXY.../audio/de \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  --output german.mp3

음성 복제

짧은 오디오 샘플에서 음성을 복제하고 모든 음성 끝점에서 재사용합니다. 음성 복제에는 명시적인 동의가 필요합니다. 현재 동의 텍스트를 가져와서 해시하고 샘플과 함께 해시를 제출하세요.

1. Fetch consent text

GEThttps://api.airforce/v1/voices/consent-text
{
  "text": "I confirm that the voice samples I am uploading are either my own voice or a voice I have explicit permission to clone…",
  "hash": "9f4b0c8d2e…"
}

2. Create the clone

POSThttps://api.airforce/v1/voices/clone
ParameterTypeRequiredDescription
namestringRequiredPublic voice name shown in the library.
descriptionstringOptionalOptional free-text description.
consent_hashstringRequiredSHA-256 of the consent paragraph. Fetch the current text via GET /v1/voices/consent-text and pass its hash field.
filesbinaryRequired1–25 audio samples. Repeat the form field per file. Total ≤ 200 MB. 30 s – 3 min per clip works best.
curl https://api.airforce/v1/voices/clone \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -F "name=My voice" \
  -F "description=Calm, conversational" \
  -F "consent_hash=9f4b0c8d2e..." \
  -F "[email protected]" \
  -F "[email protected]"
{
  "provider_voice_id": "voice_01HXY...",
  "name": "My voice",
  "description": "Calm, conversational",
  "created_at": "2026-05-06T22:30:00Z",
  "status": "active",
  "provider": "elevenlabs"
}

3. List your library

GEThttps://api.airforce/v1/voices/library
curl https://api.airforce/v1/voices/library \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY"
ParameterTypeRequiredDescription
voices[].provider_voice_idstringOptionalPass as "voice" on TTS / voice-changer endpoints.
voices[].statusstringOptional"active" | "errored" | "deleting".
voices[].providerstringOptionalUpstream that hosts the clone.
voices[].last_errorstringOptionalSet when status is "errored".

4. Update / delete

PATCHhttps://api.airforce/v1/voices/clone/:id
DELETEhttps://api.airforce/v1/voices/clone/:id

PATCH accepts name and description in a JSON body. DELETE removes the voice both locally and at the upstream provider.


메모

  • Audio responses are returned as raw bytes with the right Content-Type. PCM / µ-law formats are wrapped in a minimal WAV header so they're browser-playable as-is.
  • Multipart endpoints (transcriptions, isolation, voice-changer, dubbing, cloning) accept up to 200 MB per request.
  • Voice IDs work across providers: a cloned ElevenLabs voice can be passed straight to /v1/audio/voice-changer.
  • Cost is metered per character (TTS), per second (music / SFX / dubbing / changer), or per audio minute (transcription). Check X-Cost-Cents on the response.