API REFERENCE

Âm thanh

Chuyển văn bản thành giọng nói, chuyển giọng nói thành văn bản, âm nhạc, hiệu ứng âm thanh, thay đổi giọng nói, lồng tiếng và sao chép giọng nói — một khóa API cho mọi nhà cung cấp.

Một bề mặt audio duy nhất bao quát text-to-speech, transcription, nhạc, hiệu ứng âm thanh, dubbing, đổi giọng và nhân bản giọng nói. Các endpoint cốt lõi tương thích OpenAI, trong khi các tính năng bổ sung phong phú hơn — cài đặt giọng, speaker diarization, dubbing — được chấp nhận ở những nơi nhà cung cấp upstream hỗ trợ.

Hãy liệt kê các giọng nói khả dụng trước; các giọng nhân bản mà bạn tạo ra sẽ xuất hiện trong cùng danh sách và được dùng theo cùng cách.

Endpoints in this section: /v1/audio/speech, /music, /sound-effects, /transcriptions, /audio-isolation, /voice-changer, /dubbing, /voices, plus /v1/voices/* for cloning.

Chuyển văn bản thành giọng nói

Tổng hợp lời nói từ văn bản. Trả về byte âm thanh thô với Loại nội dung phù hợp (ví dụ: âm thanh/mpeg). Các định dạng PCM và µ-law bao gồm tiêu đề WAV để chúng có thể phát trên mọi trình duyệt.

POSThttps://api.airforce/v1/audio/speech

mô hình TTS

…· live

Parameter	Type	Required	Description
model	string	Required	TTS model ID. See /v1/models for IDs with input_modalities containing "text" and output_modalities containing "audio".
input	string	Required	Text to synthesise. Long inputs are chunked automatically.
voice	string	Required	Voice ID. Use GET /v1/audio/voices to list options. Cloned voices appear here too.
response_format	string	Optional	"mp3" (default), "mp3_44100_128", "mp3_44100_192", "pcm_22050", "pcm_24000", "pcm_44100", "ulaw_8000".
speed	float	Optional	0.25 – 4.0. OpenAI-compatible. Some upstream providers ignore this.
voice_settings	object	Optional	ElevenLabs-shape: { stability: 0–1, similarity_boost: 0–1, style: 0–1, use_speaker_boost: bool }.
language_code	string	Optional	ISO-639-1 hint, e.g. "de", "en", "ja". Improves prosody for multilingual models.
seed	integer	Optional	Reproducibility seed where supported.

Ví dụ

curl https://api.airforce/v1/audio/speech \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  --output speech.mp3 \
  -d '{
    "model": "elevenlabs-multilingual-v2",
    "input": "Willkommen bei Airforce.",
    "voice": "21m00Tcm4TlvDq8ikWAM",
    "response_format": "mp3_44100_128",
    "voice_settings": {"stability": 0.6, "similarity_boost": 0.8}
  }'

Liệt kê giọng nói

Trả về mọi giọng nói bạn có thể chuyển dưới dạng tham số "giọng nói" trong các cuộc gọi TTS / voice-over / audiobook. Giọng nói nhân bản cũng được trả lại ở đây khi trạng thái của chúng được kích hoạt.

GEThttps://api.airforce/v1/audio/voices

curl https://api.airforce/v1/audio/voices \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY"

Cấu trúc phản hồi

Parameter	Type	Required	Description
voices[]	array	Optional	List of voice descriptors.
voices[].voice_id	string	Optional	Provider-native voice identifier — the field is voice_id (not id). Pass this value as "voice".
voices[].name	string	Optional	Human-readable name.
voices[].description	string	Optional	Short description, when the upstream exposes one.
voices[].category	string	Optional	"premade" \| "cloned" \| "professional".
voices[].preview_url	string	Optional	Short audio sample, when the upstream exposes one.
voices[].labels	object	Optional	Free-form metadata: gender, language, accent, age, use case.
live	boolean	Optional	true when the catalog came from a live upstream call; false when served from the built-in premade fallback.

{
  "voices": [
    {
      "voice_id": "CwhRBWXzGAHq8TQ4Fs17",
      "name": "Roger - Laid-Back, Casual, Resonant",
      "description": "Easy going and perfect for casual conversations.",
      "preview_url": "https://.../58ee3ff5.mp3",
      "category": "premade",
      "labels": {"accent": "american", "gender": "male", "language": "en", "use_case": "conversational"}
    }
  ],
  "live": true
}

Thế hệ âm nhạc

Tạo các bản nhạc đầy đủ từ lời nhắc văn bản. Trả về âm thanh nhị phân.

POSThttps://api.airforce/v1/audio/music

Endpoint này phục vụ các model nhạc gốc (ví dụ music-v1). Các model Suno (suno-*) không khả dụng ở đây và sẽ trả về provider_not_supported — thay vào đó hãy gọi chúng qua endpoint /v1/images/generations (xem tài liệu tham khảo Media).

Parameter	Type	Required	Description
model	string	Required	Music model ID, e.g. "music-v1".
prompt	string	Required	Style / mood / structure description.
duration_seconds	integer	Optional	Track length. Range depends on the model (typically 15–120 s).
response_format	string	Optional	"mp3" (default) or provider-native.
instrumental	boolean	Optional	When true, suppresses vocals.
style	string	Optional	Optional genre tag list, e.g. "EDM, bass, dark".

curl https://api.airforce/v1/audio/music \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  --output track.mp3 \
  -d '{
    "model": "music-v1",
    "prompt": "Lofi hip-hop beat with soft piano and rain",
    "duration_seconds": 60,
    "instrumental": true
  }'

Hiệu ứng âm thanh

SFX ngắn từ lời nhắc văn bản. Hình dạng tương tự như âm nhạc, chỉ có thời lượng ngắn hơn.

POSThttps://api.airforce/v1/audio/sound-effects

Parameter	Type	Required	Description
model	string	Required	SFX model ID.
prompt	string	Required	Effect description, e.g. "thunder rumble fading into rain".
duration_seconds	integer	Optional	Length, typically 0.5–22 s.
response_format	string	Optional	"mp3" (default).

curl https://api.airforce/v1/audio/sound-effects \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  --output thunder.mp3 \
  -d '{
    "model": "sfx-v1",
    "prompt": "Distant thunder rolling, then rain",
    "duration_seconds": 8
  }'

Phiên âm (chuyển giọng nói thành văn bản)

Tải lên nhiều phần của một tập tin âm thanh. Trả về văn bản được phiên âm.

POSThttps://api.airforce/v1/audio/transcriptions

Mô hình phiên mã

…· live

Parameter	Type	Required	Description
model	string	Required	Transcription model ID. See the live list below for valid IDs.
file	binary	Required	Audio file. Supports mp3, wav, m4a, flac, ogg, webm.
language_code	string	Optional	ISO-639-1 language hint (also accepted as "language"). Auto-detected when omitted.
diarize	boolean	Optional	Separate speakers. When true, each word carries a speaker_id.
num_speakers	integer	Optional	Expected speaker count, used together with diarize.
tag_audio_events	boolean	Optional	Mark non-speech events (laughter, silence, music) in the output.
timestamps_granularity	string	Optional	"word" (default) or "character".
additional_formats	string	Optional	Request extra rendered outputs (e.g. srt / vtt) alongside the JSON.

curl https://api.airforce/v1/audio/transcriptions \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -F "[email protected]" \
  -F "model=elevenlabs-scribe" \
  -F "language_code=de" \
  -F "diarize=true"

Cấu trúc phản hồi

{
  "language_code": "deu",
  "language_probability": 0.98,
  "text": "Willkommen zum Meeting...",
  "words": [
    {"text": "Willkommen", "start": 0.0, "end": 0.62, "type": "word", "logprob": -0.08, "speaker_id": "speaker_0"},
    {"text": " ", "start": 0.62, "end": 0.62, "type": "spacing", "logprob": 0.0}
  ],
  "audio_duration_secs": 412.5,
  "transcription_id": "tx_01HXY..."
}

Phản hồi tuân theo cấu trúc gốc của nhà cung cấp thượng nguồn (ElevenLabs Scribe), không phải của OpenAI Whisper: các token được trả về dưới dạng mảng words[] phẳng (mỗi phần tử có một type là word/spacing và một logprob), không phải segments[]. Thời lượng nằm ở audio_duration_secs, và language_code theo chuẩn ISO-639-3 (ví dụ eng, deu). speaker_id theo từng từ chỉ xuất hiện khi bạn truyền diarize=true.

Cách ly âm thanh

Loại bỏ tiếng ồn nền khỏi clip trong khi vẫn giữ được giọng nói ở tiền cảnh. Tải lên nhiều phần, trả về âm thanh.

POSThttps://api.airforce/v1/audio/audio-isolation

Parameter	Type	Required	Description
model	string	Required	Isolation model ID.
file	binary	Required	Input audio.

curl https://api.airforce/v1/audio/audio-isolation \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -F "model=isolation-v1" \
  -F "[email protected]" \
  --output clean.mp3

Bộ thay đổi giọng nói (chuyển giọng nói thành giọng nói)

Lấy giọng nói đầu vào và kết xuất lại bằng giọng khác trong khi vẫn giữ nguyên thời gian và biến điệu.

POSThttps://api.airforce/v1/audio/voice-changer

Parameter	Type	Required	Description
model	string	Required	Voice-change model ID.
voice	string	Required	Target voice ID. Same catalog as TTS.
file	binary	Required	Input audio.
voice_settings	object	Optional	Optional ElevenLabs-shape settings (stability, similarity_boost, …).

curl https://api.airforce/v1/audio/voice-changer \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -F "model=voice-changer-v1" \
  -F "voice=21m00Tcm4TlvDq8ikWAM" \
  -F "[email protected]" \
  --output transformed.mp3

Lồng tiếng

Lồng tiếng bất đồng bộ sang một ngôn ngữ đích. Trả về dubbing_id ngay lập tức; thăm dò trạng thái cho đến khi nó là "dubbed", rồi tải xuống âm thanh đã lồng tiếng cho ngôn ngữ đó.

1. Create job

POSThttps://api.airforce/v1/audio/dubbing

Parameter	Type	Required	Description
model	string	Required	Dubbing model ID.
file	binary	Required	Source audio or video (mp3, wav, m4a, mp4 — audio is extracted automatically). Alternatively pass source_url.
target_lang	string	Required	Target language code (ISO-639-1). One language per job — repeating the field does not add languages.
source_lang	string	Optional	Source language. "auto" or omit for auto-detect.
num_speakers	integer	Optional	Hint for diarization. Auto when omitted.
drop_background_audio	boolean	Optional	Remove background music / noise from the dub.
watermark	boolean	Optional	Add an audible watermark to the output.

curl https://api.airforce/v1/audio/dubbing \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -F "model=dubbing-v1" \
  -F "[email protected]" \
  -F "target_lang=de" \
  -F "source_lang=en"

{
  "dubbing_id": "abc123def456",
  "expected_duration_sec": 42.5
}

2. Poll status

GEThttps://api.airforce/v1/audio/dubbing/:dubbing_id

Trạng thái được chuyển tiếp nguyên văn từ nhà cung cấp. status đọc là "dubbing" trong khi đang chạy và "dubbed" khi đã sẵn sàng (không phải "completed"). Các ngôn ngữ nằm dưới target_languages (không phải available_languages), và không có trường progress.

{
  "dubbing_id": "abc123def456",
  "status": "dubbed",
  "source_language": "en",
  "target_languages": ["de"],
  "media_metadata": {"duration": 42.5, "content_type": "video/mp4"},
  "name": "english.mp4",
  "created_at": "2026-05-06T22:30:00Z",
  "editable": false,
  "error": null
}

3. Download per language

GEThttps://api.airforce/v1/audio/dubbing/:dubbing_id/audio/:lang

curl https://api.airforce/v1/audio/dubbing/abc123def456/audio/de \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  --output german.mp3

Nhân bản giọng nói

Sao chép giọng nói từ các mẫu âm thanh ngắn và tái sử dụng nó trên mọi điểm cuối giọng nói. Nhân bản giọng nói yêu cầu có sự đồng ý rõ ràng — tìm nạp văn bản chấp thuận hiện tại, băm nó và gửi hàm băm cùng với mẫu của bạn.

1. Fetch consent text

GEThttps://api.airforce/v1/voices/consent-text

{
  "text": "I confirm that the voice samples I am uploading are either my own voice or a voice I have explicit permission to clone…",
  "hash": "9f4b0c8d2e…"
}

2. Create the clone

POSThttps://api.airforce/v1/voices/clone

Parameter	Type	Required	Description
name	string	Required	Public voice name shown in the library.
description	string	Optional	Optional free-text description.
consent_hash	string	Required	SHA-256 of the consent paragraph. Fetch the current text via GET /v1/voices/consent-text and pass its hash field.
files	binary	Required	1–25 audio samples. Repeat the form field per file. Total ≤ 200 MB. 30 s – 3 min per clip works best.

curl https://api.airforce/v1/voices/clone \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY" \
  -F "name=My voice" \
  -F "description=Calm, conversational" \
  -F "consent_hash=9f4b0c8d2e..." \
  -F "[email protected]" \
  -F "[email protected]"

{
  "voice_id": "voice_01HXY...",
  "name": "My voice",
  "status": "active",
  "created_at": "2026-05-06T22:30:00Z"
}

Lưu ý về tên trường: phản hồi tạo trả về voice mới dưới dạng voice_id, trong khi GET /v1/voices/library liệt kê các bản clone dưới provider_voice_id. Cả hai đều giữ cùng một định danh — chính là giá trị bạn truyền vào voice.

3. List your library

GEThttps://api.airforce/v1/voices/library

curl https://api.airforce/v1/voices/library \
  -H "Authorization: Bearer sk-air-YOUR_API_KEY"

Parameter	Type	Required	Description
voices[].provider_voice_id	string	Optional	Pass as "voice" on TTS / voice-changer endpoints.
voices[].status	string	Optional	"active" \| "errored" \| "deleting".
voices[].provider	string	Optional	Upstream that hosts the clone.
voices[].last_error	string	Optional	Set when status is "errored".

4. Update / delete

PATCHhttps://api.airforce/v1/voices/clone/:id

DELETEhttps://api.airforce/v1/voices/clone/:id

PATCH accepts name and description in a JSON body. DELETE removes the voice both locally and at the upstream provider.

Ghi chú

Phản hồi audio được trả về dưới dạng byte thô với Content-Type phù hợp. Các định dạng PCM / µ-law được bọc trong một WAV header tối giản để có thể phát trực tiếp trên trình duyệt.
Các endpoint multipart (transcriptions, isolation, voice-changer, dubbing, cloning) chấp nhận tối đa 200 MB mỗi request.
Voice ID hoạt động xuyên suốt các nhà cung cấp: một voice ElevenLabs đã clone có thể được truyền thẳng vào /v1/audio/voice-changer.
Chi phí được tính theo từng ký tự (TTS), theo từng giây (music / SFX / dubbing / voice-changer) hoặc theo từng phút audio (transcription) và được trừ vào số dư của bạn. Các endpoint audio không gửi header phản hồi X-Cost-Cents — hãy theo dõi chi tiêu trong nhật ký sử dụng trên dashboard của bạn.