Audio
Text-to-speech, speech-to-text transcription, and audio translation using TTS and Whisper models. OpenAI-compatible.
Generate speech from text
Authorization
api_key API key authentication using Bearer token format
In: header
Request Body
application/json
The text to generate audio for. Maximum length is 4096 characters.
Control the voice with additional instructions. Does not work with tts-1 or tts-1-hd.
One of the available TTS models: tts-1, tts-1-hd, or gpt-4o-mini-tts.
The speed of the generated audio. Select a value from 0.25 to 4.0. Default is 1.0.
floatThe voice to use when generating the audio.
"alloy" | "ash" | "ballad" | "coral" | "echo" | "fable" | "nova" | "onyx" | "sage" | "shimmer" | "verse" | "marin" | "cedar"Response Body
audio/mpeg
curl -X POST "https://loading/api/v1/audio/speech" \ -H "Content-Type: application/json" \ -d '{ "input": "string", "model": "string", "voice": "alloy" }'Transcribe audio to text
Authorization
api_key API key authentication using Bearer token format
In: header
Request Body
multipart/form-data
Additional information to include in the response.
Speaker names corresponding to known_speaker_references. Up to 4 speakers.
Audio samples as data URLs for known speaker references. 2-10 seconds each.
The language of the input audio in ISO-639-1 format (e.g., "en"). Supplying the input language improves accuracy and latency.
ID of the model to use. Options: gpt-4o-transcribe, gpt-4o-mini-transcribe, whisper-1, gpt-4o-transcribe-diarize.
Optional text to guide the model's style or continue a previous audio segment. The prompt should match the audio language.
If true, stream the response using server-sent events. Not supported for whisper-1.
The sampling temperature, between 0 and 1.
floatThe timestamp granularities to populate. Requires response_format=verbose_json.
Response Body
application/json
curl -X POST "https://loading/api/v1/audio/transcriptions" \ -F model="string"{
"logprobs": [
{
"bytes": [
0
],
"logprob": 0.1,
"token": "string"
}
],
"text": "string",
"usage": {}
}Translate audio to English text
Authorization
api_key API key authentication using Bearer token format
In: header
Request Body
multipart/form-data
ID of the model to use. Only whisper-1 is currently available.
Optional text to guide the model's style or continue a previous audio segment. The prompt should be in English.
The sampling temperature, between 0 and 1.
floatResponse Body
application/json
curl -X POST "https://loading/api/v1/audio/translations" \ -F model="string"{
"text": "string"
}