
Eleven Music: new tools for exploring, editing and producing music with AI
Realtime Speech to Text
Scribe v2 Realtime is the most accurate real-time transcription model with 150ms latency across 90+ languages. Available via API.
Ultra-fast, ultra-accurate, and built for live speech. Scribe v2 Realtime delivers instant transcription for agents, meetings, and conversational AI.
Trained on diverse global data and fine-tuned for natural speech, Scribe achieves industry-best Word Error Rates across major languages and accents.
Stream audio and receive transcriptions in ~150 ms, enabling real-time understanding for live agents, meetings, and conversational AI.

Scribe v2 Realtime is purpose-built for developers creating conversational agents, meeting assistants, and voice applications where speed and accuracy are critical.
Scribe v2 Realtime ensures consistent understanding everywhere, delivering exceptional accuracy across 90 languages, handling diverse accents, dialects, and acoustic conditions with ease.
Supports PCM (8–48 kHz) and μ-law encoding for compatibility across telephony, browser, and studio setups.
Detects when speech starts and stops, segmenting audio precisely for smooth, efficient real-time transcription.
Gives developers control over when to finalize transcripts – ideal for custom streaming and fine-tuned accuracy.





Built on the foundation of Scribe v1, Scribe v2 Realtime delivers ~150 ms latency with breakthrough accuracy across accents, tones, and environments.
Scribe v2 Realtime uses predictive transcription to anticipate the most probable next words and punctuation – enabling real-time accuracy.
Built-in support for complex vocabulary including technical language, medications, and proper nouns.
Send audio in continuous chunks and receive live transcriptions instantly – no buffering, just real-time understanding.
Scribe v2 Realtime continues transcription seamlessly, even when connection resets.

Natural Speech
Filler words, pauses and emotional cues

Low-quality audio
Background noise or low-bandwidth audio

Accents
Diverse accents and pronunciations

Domain terms
Acronyms, brands, financial or medical terms
Power real-time voice interactions and conversational AI with instant, low-latency transcription. Scribe v2 Realtime enables agents to listen, understand, and respond faster than ever.

Integrate ultra-fast Speech-to-Text directly into your product with a simple WebSocket or REST API. Stream audio as it happens and receive accurate text in under 100 ms.

Experience best-in-class accuracy and responsiveness with pricing designed to scale from startups to enterprise teams.
$0.28 per hour & lower
on annual Business plans

Powered by ElevenLabs Agents