Api.Airforce
API REFERENCE

Audio

Tekst-naar-spraak, spraak-naar-tekst, muziek, geluidseffecten, stemverandering, nasynchronisatie en stemklonen: één API-sleutel, elke provider.

Eén audio-surface dekt text-to-speech, transcription, muziek, geluidseffecten, dubbing, voice changing en voice cloning. De kern-endpoints zijn OpenAI-compatibel, terwijl rijkere extra's — voice settings, speaker diarization, dubbing — worden geaccepteerd waar de upstream-provider dit ondersteunt.

Vraag eerst de beschikbare voices op; gekloonde voices die je aanmaakt verschijnen in dezelfde lijst en worden op dezelfde manier gebruikt.

Endpoints in this section: /v1/audio/speech, /music, /sound-effects, /transcriptions, /audio-isolation, /voice-changer, /dubbing, /voices, plus /v1/voices/* for cloning.

Tekst-naar-spraak

Synthetiseer spraak uit tekst. Retourneert onbewerkte audiobytes met het overeenkomende inhoudstype (bijvoorbeeld audio/mpeg). PCM- en µ-law-formaten bevatten een WAV-header, zodat ze in elke browser kunnen worden afgespeeld.

POSThttps://api.airforce/v1/audio/speech

TTS-modellen

· live
ParameterTypeRequiredDescription
modelstringRequiredTTS model ID. See /v1/models for IDs with input_modalities containing "text" and output_modalities containing "audio".
inputstringRequiredText to synthesise. Long inputs are chunked automatically.
voicestringRequiredVoice ID. Use GET /v1/audio/voices to list options. Cloned voices appear here too.
response_formatstringOptional"mp3" (default), "mp3_44100_128", "mp3_44100_192", "pcm_22050", "pcm_24000", "pcm_44100", "ulaw_8000".
speedfloatOptional0.25 – 4.0. OpenAI-compatible. Some upstream providers ignore this.
voice_settingsobjectOptionalElevenLabs-shape: { stability: 0–1, similarity_boost: 0–1, style: 0–1, use_speaker_boost: bool }.
language_codestringOptionalISO-639-1 hint, e.g. "de", "en", "ja". Improves prosody for multilingual models.
seedintegerOptionalReproducibility seed where supported.

Voorbeeld

curl https://api.airforce/v1/audio/speech \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  --output speech.mp3 \
  -d '{
    "model": "elevenlabs-multilingual-v2",
    "input": "Willkommen bei Airforce.",
    "voice": "21m00Tcm4TlvDq8ikWAM",
    "response_format": "mp3_44100_128",
    "voice_settings": {"stability": 0.6, "similarity_boost": 0.8}
  }'

Maak een lijst van stemmen

Retourneert elke stem die u kunt doorgeven als de parameter 'stem' bij TTS-/voice-over-/audioboekoproepen. Gekloonde stemmen worden hier ook teruggestuurd zodra hun status actief is.

GEThttps://api.airforce/v1/audio/voices
curl https://api.airforce/v1/audio/voices \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY"

Antwoord-structuur

ParameterTypeRequiredDescription
voices[]arrayOptionalList of voice descriptors.
voices[].voice_idstringOptionalProvider-native voice identifier — the field is voice_id (not id). Pass this value as "voice".
voices[].namestringOptionalHuman-readable name.
voices[].descriptionstringOptionalShort description, when the upstream exposes one.
voices[].categorystringOptional"premade" | "cloned" | "professional".
voices[].preview_urlstringOptionalShort audio sample, when the upstream exposes one.
voices[].labelsobjectOptionalFree-form metadata: gender, language, accent, age, use case.
livebooleanOptionaltrue when the catalog came from a live upstream call; false when served from the built-in premade fallback.
{
  "voices": [
    {
      "voice_id": "CwhRBWXzGAHq8TQ4Fs17",
      "name": "Roger - Laid-Back, Casual, Resonant",
      "description": "Easy going and perfect for casual conversations.",
      "preview_url": "https://.../58ee3ff5.mp3",
      "category": "premade",
      "labels": {"accent": "american", "gender": "male", "language": "en", "use_case": "conversational"}
    }
  ],
  "live": true
}

Muziek generatie

Genereer volledige muziektracks via een tekstprompt. Retourneert binaire audio.

POSThttps://api.airforce/v1/audio/music

Dit endpoint bedient de native muziekmodellen (bijv. music-v1). Suno-modellen (suno-*) zijn hier niet beschikbaar en geven provider_not_supported terug — roep ze in plaats daarvan aan via het /v1/images/generations endpoint (zie de Media-referentie).

ParameterTypeRequiredDescription
modelstringRequiredMusic model ID, e.g. "music-v1".
promptstringRequiredStyle / mood / structure description.
duration_secondsintegerOptionalTrack length. Range depends on the model (typically 15–120 s).
response_formatstringOptional"mp3" (default) or provider-native.
instrumentalbooleanOptionalWhen true, suppresses vocals.
stylestringOptionalOptional genre tag list, e.g. "EDM, bass, dark".
curl https://api.airforce/v1/audio/music \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  --output track.mp3 \
  -d '{
    "model": "music-v1",
    "prompt": "Lofi hip-hop beat with soft piano and rain",
    "duration_seconds": 60,
    "instrumental": true
  }'

Geluidseffecten

Korte SFX vanaf een tekstprompt. Dezelfde vorm als muziek, alleen een kortere duur.

POSThttps://api.airforce/v1/audio/sound-effects
ParameterTypeRequiredDescription
modelstringRequiredSFX model ID.
promptstringRequiredEffect description, e.g. "thunder rumble fading into rain".
duration_secondsintegerOptionalLength, typically 0.5–22 s.
response_formatstringOptional"mp3" (default).
curl https://api.airforce/v1/audio/sound-effects \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  --output thunder.mp3 \
  -d '{
    "model": "sfx-v1",
    "prompt": "Distant thunder rolling, then rain",
    "duration_seconds": 8
  }'

Transcripties (spraak-naar-tekst)

Meerdelige upload van een audiobestand. Retourneert de getranscribeerde tekst.

POSThttps://api.airforce/v1/audio/transcriptions

Transcriptiemodellen

· live
ParameterTypeRequiredDescription
modelstringRequiredTranscription model ID. See the live list below for valid IDs.
filebinaryRequiredAudio file. Supports mp3, wav, m4a, flac, ogg, webm.
language_codestringOptionalISO-639-1 language hint (also accepted as "language"). Auto-detected when omitted.
diarizebooleanOptionalSeparate speakers. When true, each word carries a speaker_id.
num_speakersintegerOptionalExpected speaker count, used together with diarize.
tag_audio_eventsbooleanOptionalMark non-speech events (laughter, silence, music) in the output.
timestamps_granularitystringOptional"word" (default) or "character".
additional_formatsstringOptionalRequest extra rendered outputs (e.g. srt / vtt) alongside the JSON.
curl https://api.airforce/v1/audio/transcriptions \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -F "[email protected]" \
  -F "model=elevenlabs-scribe" \
  -F "language_code=de" \
  -F "diarize=true"

Antwoord-structuur

{
  "language_code": "deu",
  "language_probability": 0.98,
  "text": "Willkommen zum Meeting...",
  "words": [
    {"text": "Willkommen", "start": 0.0, "end": 0.62, "type": "word", "logprob": -0.08, "speaker_id": "speaker_0"},
    {"text": " ", "start": 0.62, "end": 0.62, "type": "spacing", "logprob": 0.0}
  ],
  "audio_duration_secs": 412.5,
  "transcription_id": "tx_01HXY..."
}

De response volgt de native vorm van de upstream-provider (ElevenLabs Scribe), niet die van OpenAI Whisper: tokens komen terug als een platte words[] array (elk met een type van word/spacing en een logprob), niet als segments[]. De duur is audio_duration_secs, en language_code is ISO-639-3 (bijv. eng, deu). Een speaker_id per woord is alleen aanwezig wanneer je diarize=true meegeeft.


Audio-isolatie

Verwijder achtergrondgeluiden uit een clip terwijl de voorgrondstem behouden blijft. Uploaden uit meerdere delen, retourneert audio.

POSThttps://api.airforce/v1/audio/audio-isolation
ParameterTypeRequiredDescription
modelstringRequiredIsolation model ID.
filebinaryRequiredInput audio.
curl https://api.airforce/v1/audio/audio-isolation \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -F "model=isolation-v1" \
  -F "[email protected]" \
  --output clean.mp3

Stemwisselaar (spraak-naar-spraak)

Neem invoerspraak en geef deze opnieuw weer met een andere stem, waarbij de timing en verbuiging behouden blijven.

POSThttps://api.airforce/v1/audio/voice-changer
ParameterTypeRequiredDescription
modelstringRequiredVoice-change model ID.
voicestringRequiredTarget voice ID. Same catalog as TTS.
filebinaryRequiredInput audio.
voice_settingsobjectOptionalOptional ElevenLabs-shape settings (stability, similarity_boost, …).
curl https://api.airforce/v1/audio/voice-changer \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -F "model=voice-changer-v1" \
  -F "voice=21m00Tcm4TlvDq8ikWAM" \
  -F "[email protected]" \
  --output transformed.mp3

Nasynchronisatie

Asynchrone nasynchronisatie naar één doeltaal. Geeft direct een dubbing_id terug; poll de status tot deze "dubbed" is en download dan de nagesynchroniseerde audio voor die taal.

1. Create job

POSThttps://api.airforce/v1/audio/dubbing
ParameterTypeRequiredDescription
modelstringRequiredDubbing model ID.
filebinaryRequiredSource audio or video (mp3, wav, m4a, mp4 — audio is extracted automatically). Alternatively pass source_url.
target_langstringRequiredTarget language code (ISO-639-1). One language per job — repeating the field does not add languages.
source_langstringOptionalSource language. "auto" or omit for auto-detect.
num_speakersintegerOptionalHint for diarization. Auto when omitted.
drop_background_audiobooleanOptionalRemove background music / noise from the dub.
watermarkbooleanOptionalAdd an audible watermark to the output.
curl https://api.airforce/v1/audio/dubbing \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -F "model=dubbing-v1" \
  -F "[email protected]" \
  -F "target_lang=de" \
  -F "source_lang=en"
{
  "dubbing_id": "abc123def456",
  "expected_duration_sec": 42.5
}

2. Poll status

GEThttps://api.airforce/v1/audio/dubbing/:dubbing_id

De status wordt letterlijk doorgegeven vanaf de provider. status is "dubbing" zolang het bezig is en "dubbed" wanneer het klaar is (niet "completed"). Talen staan onder target_languages (niet available_languages), en er is geen progress veld.

{
  "dubbing_id": "abc123def456",
  "status": "dubbed",
  "source_language": "en",
  "target_languages": ["de"],
  "media_metadata": {"duration": 42.5, "content_type": "video/mp4"},
  "name": "english.mp4",
  "created_at": "2026-05-06T22:30:00Z",
  "editable": false,
  "error": null
}

3. Download per language

GEThttps://api.airforce/v1/audio/dubbing/:dubbing_id/audio/:lang
curl https://api.airforce/v1/audio/dubbing/abc123def456/audio/de \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  --output german.mp3

Stemklonen

Kloon een stem uit korte audiofragmenten en hergebruik deze op elk spraakeindpunt. Voor het klonen van stemmen is expliciete toestemming vereist. Haal de huidige toestemmingstekst op, hash deze en verzend de hash met uw voorbeelden.

1. Fetch consent text

GEThttps://api.airforce/v1/voices/consent-text
{
  "text": "I confirm that the voice samples I am uploading are either my own voice or a voice I have explicit permission to clone…",
  "hash": "9f4b0c8d2e…"
}

2. Create the clone

POSThttps://api.airforce/v1/voices/clone
ParameterTypeRequiredDescription
namestringRequiredPublic voice name shown in the library.
descriptionstringOptionalOptional free-text description.
consent_hashstringRequiredSHA-256 of the consent paragraph. Fetch the current text via GET /v1/voices/consent-text and pass its hash field.
filesbinaryRequired1–25 audio samples. Repeat the form field per file. Total ≤ 200 MB. 30 s – 3 min per clip works best.
curl https://api.airforce/v1/voices/clone \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -F "name=My voice" \
  -F "description=Calm, conversational" \
  -F "consent_hash=9f4b0c8d2e..." \
  -F "[email protected]" \
  -F "[email protected]"
{
  "voice_id": "voice_01HXY...",
  "name": "My voice",
  "status": "active",
  "created_at": "2026-05-06T22:30:00Z"
}

Let op de veldnamen: de create-response geeft de nieuwe stem terug als voice_id, terwijl GET /v1/voices/library clones onder provider_voice_id toont. Beide bevatten dezelfde identifier — de waarde die je als voice meegeeft.

3. List your library

GEThttps://api.airforce/v1/voices/library
curl https://api.airforce/v1/voices/library \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY"
ParameterTypeRequiredDescription
voices[].provider_voice_idstringOptionalPass as "voice" on TTS / voice-changer endpoints.
voices[].statusstringOptional"active" | "errored" | "deleting".
voices[].providerstringOptionalUpstream that hosts the clone.
voices[].last_errorstringOptionalSet when status is "errored".

4. Update / delete

PATCHhttps://api.airforce/v1/voices/clone/:id
DELETEhttps://api.airforce/v1/voices/clone/:id

PATCH accepts name and description in a JSON body. DELETE removes the voice both locally and at the upstream provider.


Opmerkingen

  • Audio-responses worden geretourneerd als ruwe bytes met het juiste Content-Type. PCM / µ-law formaten worden verpakt in een minimale WAV-header zodat ze zonder aanpassingen in de browser af te spelen zijn.
  • Multipart-endpoints (transcriptions, isolation, voice-changer, dubbing, cloning) accepteren tot 200 MB per request.
  • Voice-ID's werken over providers heen: een gekloonde ElevenLabs-stem kan rechtstreeks aan /v1/audio/voice-changer worden meegegeven.
  • De kosten worden gemeten per karakter (TTS), per seconde (music / SFX / dubbing / voice-changer) of per audiominuut (transcription) en worden van je saldo afgetrokken. Audio-endpoints sturen geen X-Cost-Cents response-header — volg je uitgaven in het usage-log van je dashboard.