
Eleven Music: new tools for exploring, editing and producing music with AI
Speech to Text
Scribe is the most accurate Speech to Text model. Scribe v2 Realtime sets the benchmark for live transcriptions - powering agents and real-time applications. Both available via API.
Scribe v2 Realtime uses ElevenLabs’ streaming-first architecture to turn live speech to text instantly, across 90 languages.

Scribe v2 Realtime captures live speech in under 150 ms with exceptional accuracy – built for agents, meetings, and AI Agents that demand instant understanding.
Scribe v2 Realtime delivers industry-leading accuracy with sub-150 ms latency, setting a new benchmark for real-time speech recognition.
Automatically detect when speech starts and stops, segmenting speech with precision for smoother live processing.
Delivering exceptional accuracy across accents, dialects, and recording conditions.
Build Scribe Realtime v2 into your products with the API. With full-streaming support and commit control.
Create captions, subtitles, and editable transcripts for podcasts, videos, interviews, and other recorded content – all with industry-leading accuracy in Studio or via API.



Upload audio or video in any format — MP4, MOV, MP3, WAV, and more. Scribe v1 automatically converts speech into precise text, ready for captions, subtitles, or editing.
Scribe achieves industry-leading transcription accuracy, delivering clean, editable text even in challenging audio conditions or across diverse accents.
Edit and finalize the transcripts directly in ElevenLabs or use our managed services team, to get to 100% accuracy.
From laughter to footsteps, Scribe tags every sound event, enriching your transcripts with the full context.
In any conversation, even the busiest ones, Scribe intuitively distinguishes and labels every speaker.
Integrate Scribe v1 and Scribe v2 Realtime into your product with the API or SDKs.

Enable real-time voice interactions with instant, low-latency transcription.
.webp&w=3840&q=100)
Convert recordings into editable text, captions, and repurposable content.

Our AI speech to text transcription supports 99 languages, just select the language and upload your audio file.
Powered by ElevenLabs Agents