Whisper-Large-V3-Turbo
The Speech-to-text guide has runnable examples for basic transcription, multi-language usage, large-file chunking, and a transcription + summarisation pipeline.
Description
“Whisper-Large-V3-Turbo” is a multilingual automatic speech recognition model (ASR) developed by OpenAI, optimized for speed and efficiency. It is based on the architecture of the well-known “Whisper-Large-V3” model, but uses a lighter decoder structure to significantly reduce latency with only a minimal loss in accuracy. The model supports over 99 languages and is ideal for transcribing speech inputs.
The following limitations apply to this model on our platform:
- Maximum file size: 25 MB per upload
- No explicit context length limit – depends on audio duration and file size
- Translation is currently not supported (
to_language) - Supported output formats:
json,verbose_jsonresponse_format="text"is accepted but returns a JSON body regardless — use"json"insteadsrtandvttare not supported (HTTP 400)
Supported Input Formats
mp3, ogg, wav, flac
Supported values for parameter language (ISO-639-1 language codes)
af, ar, az, be, bg, bs, ca, cs, cy, da, de, el, en, es, et, fa, fi, fr, gl, he, hi, hr, hu, hy, id, is, it, ja, kk, kn, ko, lt, lv, mk, mi, mr, ms, ne, nl, no, pl, pt, ro, ru, sk, sl, sr, sv, sw, ta, th, tl, tr, uk, ur, vi, zh
Recommended Inference Parameters
temperature=1.0top_p=1.0response_format="json"languagelikelanguage="de"should always be set explicitly to maximize accuracy. If no value is provided, German ("de") will be assumed by default, which may result in poorer outcomes for inputs in other languages.
Example output — response_format="json"
{
"text": "This is the transcribed text of a speech input.",
"usage": {
"type": "duration",
"seconds": 8
}
}
Example output — response_format="verbose_json"
Returns additional metadata including detected language, duration, and per-segment timestamps:
{
"text": "This is the transcribed text.",
"language": "en",
"duration": "8.0",
"words": null,
"segments": [
{
"id": 0,
"avg_logprob": -0.45,
"text": " This is the transcribed text.",
"start": 0.0,
"end": 2.4
}
]
}
Terms of Use and Licensing
The general terms of use apply. The model is provided by OpenAI under the MIT License, and reuse of the generated content is subject to no additional restrictions.