API REFERENCE

ऑडियो

टेक्स्ट-टू-स्पीच, स्पीच-टू-टेक्स्ट, संगीत, ध्वनि प्रभाव, आवाज बदलना, डबिंग और वॉयस क्लोनिंग - एक एपीआई कुंजी, प्रत्येक प्रदाता।

एक ही audio surface text-to-speech, transcription, संगीत, sound effects, dubbing, voice changing और voice cloning को कवर करता है। मुख्य endpoints OpenAI-compatible हैं, जबकि अधिक समृद्ध extras — voice settings, speaker diarization, dubbing — वहाँ स्वीकार किए जाते हैं जहाँ upstream provider इन्हें सपोर्ट करता है।

पहले उपलब्ध voices को list करें; आपके बनाए cloned voices उसी सूची में दिखते हैं और उसी तरह इस्तेमाल होते हैं।

Endpoints in this section: /v1/audio/speech, /music, /sound-effects, /transcriptions, /audio-isolation, /voice-changer, /dubbing, /voices, plus /v1/voices/* for cloning.

भाषण के पाठ

पाठ से भाषण का संश्लेषण करें. मिलान सामग्री-प्रकार (जैसे ऑडियो/एमपीईजी) के साथ कच्चे ऑडियो बाइट्स लौटाता है। पीसीएम और μ-लॉ प्रारूपों में एक WAV हेडर शामिल होता है ताकि वे किसी भी ब्राउज़र में चल सकें।

POSThttps://api.airforce/v1/audio/speech

टीटीएस मॉडल

…· live

Parameter	Type	Required	Description
model	string	Required	TTS model ID. See /v1/models for IDs with input_modalities containing "text" and output_modalities containing "audio".
input	string	Required	Text to synthesise. Long inputs are chunked automatically.
voice	string	Required	Voice ID. Use GET /v1/audio/voices to list options. Cloned voices appear here too.
response_format	string	Optional	"mp3" (default), "mp3_44100_128", "mp3_44100_192", "pcm_22050", "pcm_24000", "pcm_44100", "ulaw_8000".
speed	float	Optional	0.25 – 4.0. OpenAI-compatible. Some upstream providers ignore this.
voice_settings	object	Optional	ElevenLabs-shape: { stability: 0–1, similarity_boost: 0–1, style: 0–1, use_speaker_boost: bool }.
language_code	string	Optional	ISO-639-1 hint, e.g. "de", "en", "ja". Improves prosody for multilingual models.
seed	integer	Optional	Reproducibility seed where supported.

उदाहरण

curl https://api.airforce/v1/audio/speech \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  --output speech.mp3 \
  -d '{
    "model": "elevenlabs-multilingual-v2",
    "input": "Willkommen bei Airforce.",
    "voice": "21m00Tcm4TlvDq8ikWAM",
    "response_format": "mp3_44100_128",
    "voice_settings": {"stability": 0.6, "similarity_boost": 0.8}
  }'

आवाजें सूचीबद्ध करें

टीटीएस/वॉइस-ओवर/ऑडियोबुक कॉल पर "वॉयस" पैरामीटर के रूप में आपके द्वारा पारित की जा सकने वाली प्रत्येक आवाज लौटाता है। स्थिति सक्रिय होने पर क्लोन की गई आवाजें यहां भी वापस आ जाती हैं।

GEThttps://api.airforce/v1/audio/voices

curl https://api.airforce/v1/audio/voices \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY"

रिस्पॉन्स संरचना

Parameter	Type	Required	Description
voices[]	array	Optional	List of voice descriptors.
voices[].voice_id	string	Optional	Provider-native voice identifier — the field is voice_id (not id). Pass this value as "voice".
voices[].name	string	Optional	Human-readable name.
voices[].description	string	Optional	Short description, when the upstream exposes one.
voices[].category	string	Optional	"premade" \| "cloned" \| "professional".
voices[].preview_url	string	Optional	Short audio sample, when the upstream exposes one.
voices[].labels	object	Optional	Free-form metadata: gender, language, accent, age, use case.
live	boolean	Optional	true when the catalog came from a live upstream call; false when served from the built-in premade fallback.

{
  "voices": [
    {
      "voice_id": "CwhRBWXzGAHq8TQ4Fs17",
      "name": "Roger - Laid-Back, Casual, Resonant",
      "description": "Easy going and perfect for casual conversations.",
      "preview_url": "https://.../58ee3ff5.mp3",
      "category": "premade",
      "labels": {"accent": "american", "gender": "male", "language": "en", "use_case": "conversational"}
    }
  ],
  "live": true
}

संगीत पीढ़ी

टेक्स्ट प्रॉम्प्ट से पूर्ण संगीत ट्रैक जेनरेट करें। बाइनरी ऑडियो लौटाता है.

POSThttps://api.airforce/v1/audio/music

यह endpoint नेटिव म्यूज़िक मॉडल्स (जैसे music-v1) को सर्व करता है। Suno मॉडल्स (suno-*) यहाँ उपलब्ध नहीं हैं और provider_not_supported लौटाते हैं — इनके बजाय इन्हें /v1/images/generations endpoint के ज़रिए कॉल करें (Media reference देखें)।

Parameter	Type	Required	Description
model	string	Required	Music model ID, e.g. "music-v1".
prompt	string	Required	Style / mood / structure description.
duration_seconds	integer	Optional	Track length. Range depends on the model (typically 15–120 s).
response_format	string	Optional	"mp3" (default) or provider-native.
instrumental	boolean	Optional	When true, suppresses vocals.
style	string	Optional	Optional genre tag list, e.g. "EDM, bass, dark".

curl https://api.airforce/v1/audio/music \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  --output track.mp3 \
  -d '{
    "model": "music-v1",
    "prompt": "Lofi hip-hop beat with soft piano and rain",
    "duration_seconds": 60,
    "instrumental": true
  }'

ध्वनि प्रभाव

टेक्स्ट प्रॉम्प्ट से लघु एसएफएक्स। संगीत के समान आकार, बस छोटी अवधि।

POSThttps://api.airforce/v1/audio/sound-effects

Parameter	Type	Required	Description
model	string	Required	SFX model ID.
prompt	string	Required	Effect description, e.g. "thunder rumble fading into rain".
duration_seconds	integer	Optional	Length, typically 0.5–22 s.
response_format	string	Optional	"mp3" (default).

curl https://api.airforce/v1/audio/sound-effects \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  --output thunder.mp3 \
  -d '{
    "model": "sfx-v1",
    "prompt": "Distant thunder rolling, then rain",
    "duration_seconds": 8
  }'

प्रतिलेखन (भाषण-से-पाठ)

किसी ऑडियो फ़ाइल का मल्टीपार्ट अपलोड. लिखित पाठ लौटाता है.

POSThttps://api.airforce/v1/audio/transcriptions

प्रतिलेखन मॉडल

…· live

Parameter	Type	Required	Description
model	string	Required	Transcription model ID. See the live list below for valid IDs.
file	binary	Required	Audio file. Supports mp3, wav, m4a, flac, ogg, webm.
language_code	string	Optional	ISO-639-1 language hint (also accepted as "language"). Auto-detected when omitted.
diarize	boolean	Optional	Separate speakers. When true, each word carries a speaker_id.
num_speakers	integer	Optional	Expected speaker count, used together with diarize.
tag_audio_events	boolean	Optional	Mark non-speech events (laughter, silence, music) in the output.
timestamps_granularity	string	Optional	"word" (default) or "character".
additional_formats	string	Optional	Request extra rendered outputs (e.g. srt / vtt) alongside the JSON.

curl https://api.airforce/v1/audio/transcriptions \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -F "[email protected]" \
  -F "model=elevenlabs-scribe" \
  -F "language_code=de" \
  -F "diarize=true"

रिस्पॉन्स संरचना

{
  "language_code": "deu",
  "language_probability": 0.98,
  "text": "Willkommen zum Meeting...",
  "words": [
    {"text": "Willkommen", "start": 0.0, "end": 0.62, "type": "word", "logprob": -0.08, "speaker_id": "speaker_0"},
    {"text": " ", "start": 0.62, "end": 0.62, "type": "spacing", "logprob": 0.0}
  ],
  "audio_duration_secs": 412.5,
  "transcription_id": "tx_01HXY..."
}

रिस्पॉन्स अपस्ट्रीम प्रोवाइडर के नेटिव आकार (ElevenLabs Scribe) का अनुसरण करता है, OpenAI Whisper के नहीं: token एक फ्लैट words[] array के रूप में वापस आते हैं (प्रत्येक में word/spacing का एक type और एक logprob होता है), न कि segments[] के रूप में। अवधि audio_duration_secs है, और language_code ISO-639-3 है (जैसे eng, deu)। प्रति-शब्द speaker_id केवल तभी मौजूद होता है जब आप diarize=true पास करते हैं।

ऑडियो अलगाव

अग्रभूमि ध्वनि को संरक्षित करते हुए क्लिप से पृष्ठभूमि शोर को हटा दें। मल्टीपार्ट अपलोड, ऑडियो लौटाता है।

POSThttps://api.airforce/v1/audio/audio-isolation

Parameter	Type	Required	Description
model	string	Required	Isolation model ID.
file	binary	Required	Input audio.

curl https://api.airforce/v1/audio/audio-isolation \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -F "model=isolation-v1" \
  -F "[email protected]" \
  --output clean.mp3

आवाज परिवर्तक (वाक्-से-वाक्)

इनपुट भाषण लें और समय और विभक्ति को संरक्षित करते हुए इसे एक अलग आवाज़ में पुनः प्रस्तुत करें।

POSThttps://api.airforce/v1/audio/voice-changer

Parameter	Type	Required	Description
model	string	Required	Voice-change model ID.
voice	string	Required	Target voice ID. Same catalog as TTS.
file	binary	Required	Input audio.
voice_settings	object	Optional	Optional ElevenLabs-shape settings (stability, similarity_boost, …).

curl https://api.airforce/v1/audio/voice-changer \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -F "model=voice-changer-v1" \
  -F "voice=21m00Tcm4TlvDq8ikWAM" \
  -F "[email protected]" \
  --output transformed.mp3

एक प्रकार की चरबी

एक लक्ष्य भाषा में एसिंक्रोनस डबिंग। तुरंत एक dubbing_id लौटाता है; स्थिति "dubbed" होने तक पोल करें, फिर उस भाषा के लिए डब किया गया ऑडियो डाउनलोड करें।

1. Create job

POSThttps://api.airforce/v1/audio/dubbing

Parameter	Type	Required	Description
model	string	Required	Dubbing model ID.
file	binary	Required	Source audio or video (mp3, wav, m4a, mp4 — audio is extracted automatically). Alternatively pass source_url.
target_lang	string	Required	Target language code (ISO-639-1). One language per job — repeating the field does not add languages.
source_lang	string	Optional	Source language. "auto" or omit for auto-detect.
num_speakers	integer	Optional	Hint for diarization. Auto when omitted.
drop_background_audio	boolean	Optional	Remove background music / noise from the dub.
watermark	boolean	Optional	Add an audible watermark to the output.

curl https://api.airforce/v1/audio/dubbing \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -F "model=dubbing-v1" \
  -F "[email protected]" \
  -F "target_lang=de" \
  -F "source_lang=en"

{
  "dubbing_id": "abc123def456",
  "expected_duration_sec": 42.5
}

2. Poll status

GEThttps://api.airforce/v1/audio/dubbing/:dubbing_id

स्टेटस प्रोवाइडर से शब्दशः फ़ॉरवर्ड किया जाता है। status चलते समय "dubbing" दिखाता है और तैयार होने पर "dubbed" (न कि "completed")। भाषाएँ target_languages के अंतर्गत होती हैं (न कि available_languages), और कोई progress फ़ील्ड नहीं होती।

{
  "dubbing_id": "abc123def456",
  "status": "dubbed",
  "source_language": "en",
  "target_languages": ["de"],
  "media_metadata": {"duration": 42.5, "content_type": "video/mp4"},
  "name": "english.mp4",
  "created_at": "2026-05-06T22:30:00Z",
  "editable": false,
  "error": null
}

3. Download per language

GEThttps://api.airforce/v1/audio/dubbing/:dubbing_id/audio/:lang

curl https://api.airforce/v1/audio/dubbing/abc123def456/audio/de \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  --output german.mp3

आवाज क्लोनिंग

लघु ऑडियो नमूनों से एक आवाज को क्लोन करें और प्रत्येक भाषण समापन बिंदु पर इसका पुन: उपयोग करें। वॉयस क्लोनिंग के लिए स्पष्ट सहमति की आवश्यकता होती है - वर्तमान सहमति पाठ प्राप्त करें, इसे हैश करें और अपने नमूनों के साथ हैश सबमिट करें।

1. Fetch consent text

GEThttps://api.airforce/v1/voices/consent-text

{
  "text": "I confirm that the voice samples I am uploading are either my own voice or a voice I have explicit permission to clone…",
  "hash": "9f4b0c8d2e…"
}

2. Create the clone

POSThttps://api.airforce/v1/voices/clone

Parameter	Type	Required	Description
name	string	Required	Public voice name shown in the library.
description	string	Optional	Optional free-text description.
consent_hash	string	Required	SHA-256 of the consent paragraph. Fetch the current text via GET /v1/voices/consent-text and pass its hash field.
files	binary	Required	1–25 audio samples. Repeat the form field per file. Total ≤ 200 MB. 30 s – 3 min per clip works best.

curl https://api.airforce/v1/voices/clone \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -F "name=My voice" \
  -F "description=Calm, conversational" \
  -F "consent_hash=9f4b0c8d2e..." \
  -F "[email protected]" \
  -F "[email protected]"

{
  "voice_id": "voice_01HXY...",
  "name": "My voice",
  "status": "active",
  "created_at": "2026-05-06T22:30:00Z"
}

फ़ील्ड नामों पर ध्यान दें: create रिस्पॉन्स नई वॉइस को voice_id के रूप में लौटाता है, जबकि GET /v1/voices/library क्लोन्स को provider_voice_id के अंतर्गत सूचीबद्ध करता है। दोनों में वही पहचानकर्ता होता है — वह मान जिसे आप voice के रूप में पास करते हैं।

3. List your library

GEThttps://api.airforce/v1/voices/library

curl https://api.airforce/v1/voices/library \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY"

Parameter	Type	Required	Description
voices[].provider_voice_id	string	Optional	Pass as "voice" on TTS / voice-changer endpoints.
voices[].status	string	Optional	"active" \| "errored" \| "deleting".
voices[].provider	string	Optional	Upstream that hosts the clone.
voices[].last_error	string	Optional	Set when status is "errored".

4. Update / delete

PATCHhttps://api.airforce/v1/voices/clone/:id

DELETEhttps://api.airforce/v1/voices/clone/:id

PATCH accepts name and description in a JSON body. DELETE removes the voice both locally and at the upstream provider.

ऑडियो रिस्पॉन्स सही Content-Type के साथ रॉ बाइट्स के रूप में लौटाए जाते हैं। PCM / µ-law फ़ॉर्मैट एक न्यूनतम WAV header में रैप किए जाते हैं ताकि वे जैसे हैं वैसे ही ब्राउज़र में चलने योग्य हों।
Multipart endpoints (transcriptions, isolation, voice-changer, dubbing, cloning) प्रति रिक्वेस्ट 200 MB तक स्वीकार करते हैं।
Voice ID प्रोवाइडर्स के बीच काम करते हैं: एक क्लोन की गई ElevenLabs वॉइस को सीधे /v1/audio/voice-changer को पास किया जा सकता है।
लागत प्रति अक्षर (TTS), प्रति सेकंड (music / SFX / dubbing / voice-changer) या प्रति ऑडियो मिनट (transcription) के हिसाब से मीटर की जाती है और आपके बैलेंस से काट ली जाती है। ऑडियो endpoints कोई X-Cost-Cents रिस्पॉन्स header नहीं भेजते — अपने डैशबोर्ड के usage log में खर्च ट्रैक करें।

ऑडियो

भाषण के पाठ

टीटीएस मॉडल

उदाहरण

आवाजें सूचीबद्ध करें

रिस्पॉन्स संरचना

संगीत पीढ़ी

ध्वनि प्रभाव

प्रतिलेखन (भाषण-से-पाठ)

प्रतिलेखन मॉडल

रिस्पॉन्स संरचना

ऑडियो अलगाव

आवाज परिवर्तक (वाक्-से-वाक्)

एक प्रकार की चरबी

1. Create job

2. Poll status

3. Download per language

आवाज क्लोनिंग

1. Fetch consent text

2. Create the clone

3. List your library

4. Update / delete

टिप्पणियाँ