Audio

Text-to-speech, speech-to-text transcription, and audio translation using TTS and Whisper models. OpenAI-compatible.

Generate speech from text

Authorization

api_key

AuthorizationBearer <token>

API key authentication using Bearer token format

In: header

Request Body

application/json

input*string

The text to generate audio for. Maximum length is 4096 characters.

instructions?string|null

Control the voice with additional instructions. Does not work with tts-1 or tts-1-hd.

model*string

One of the available TTS models: tts-1, tts-1-hd, or gpt-4o-mini-tts.

response_format?null|SpeechResponseFormat

speed?number|null

The speed of the generated audio. Select a value from 0.25 to 4.0. Default is 1.0.

Formatfloat

stream_format?null|SpeechStreamFormat

voice*string

The voice to use when generating the audio.

Value in

"alloy" | "ash" | "ballad" | "coral" | "echo" | "fable" | "nova" | "onyx" | "sage" | "shimmer" | "verse" | "marin" | "cedar"

Response Body

`audio/mpeg`

curl -X POST "https://loading/api/v1/audio/speech" \
  -H "Content-Type: application/json" \
  -d '{
    "input": "string",
    "model": "string",
    "voice": "alloy"
  }'

curl -X POST "https://loading/api/v1/audio/speech" \  -H "Content-Type: application/json" \  -d '{    "input": "string",    "model": "string",    "voice": "alloy"  }'

Empty

Transcribe audio to text

Authorization

api_key

AuthorizationBearer <token>

API key authentication using Bearer token format

In: header

Request Body

multipart/form-data

chunking_strategy?null|TranscriptionChunkingStrategy

include?|null

Additional information to include in the response.

known_speaker_names?|null

Speaker names corresponding to known_speaker_references. Up to 4 speakers.

known_speaker_references?|null

Audio samples as data URLs for known speaker references. 2-10 seconds each.

language?string|null

The language of the input audio in ISO-639-1 format (e.g., "en"). Supplying the input language improves accuracy and latency.

model*string

ID of the model to use. Options: gpt-4o-transcribe, gpt-4o-mini-transcribe, whisper-1, gpt-4o-transcribe-diarize.

prompt?string|null

Optional text to guide the model's style or continue a previous audio segment. The prompt should match the audio language.

response_format?null|AudioResponseFormat

stream?boolean|null

If true, stream the response using server-sent events. Not supported for whisper-1.

temperature?number|null

The sampling temperature, between 0 and 1.

Formatfloat

timestamp_granularities?|null

The timestamp granularities to populate. Requires response_format=verbose_json.

Response Body

`application/json`

curl -X POST "https://loading/api/v1/audio/transcriptions" \
  -F model="string"

curl -X POST "https://loading/api/v1/audio/transcriptions" \  -F model="string"

{
  "logprobs": [
    {
      "bytes": [
        0
      ],
      "logprob": 0.1,
      "token": "string"
    }
  ],
  "text": "string",
  "usage": {}
}

Empty

Translate audio to English text

Authorization

api_key

AuthorizationBearer <token>

API key authentication using Bearer token format

In: header

Request Body

multipart/form-data

model*string

ID of the model to use. Only whisper-1 is currently available.

prompt?string|null

Optional text to guide the model's style or continue a previous audio segment. The prompt should be in English.

response_format?null|AudioResponseFormat

temperature?number|null

The sampling temperature, between 0 and 1.

Formatfloat

Response Body

`application/json`

curl -X POST "https://loading/api/v1/audio/translations" \
  -F model="string"

curl -X POST "https://loading/api/v1/audio/translations" \  -F model="string"

{
  "text": "string"
}

Empty

200audio/mpeg

400

500

200application/json

400

500

200application/json

400

500

`audio/mpeg`

`application/json`

`application/json`