ElevenLabs and Google Lead in New Speech-to-Text Benchmark Rankings

In the latest release of Artificial Analysis' AA-WER speech-to-text benchmark version 2.0, ElevenLabs' Scribe v2 has emerged as the top performer with an impressively low word error rate of 2.3%. Following closely, Google's Gemini 3 Pro achieved a 2.9% error rate, while Mistral's Voxtral Small recorded a 3.0% error rate. Not too far behind are Google's Gemini 3 Flash at 3.1% and ElevenLabs' earlier model, Scribe v1, at 3.2%.

An interesting aspect of Google's results is that Gemini was not specifically trained for transcription tasks, yet it excelled due to its versatile multimodal capabilities. Meanwhile, OpenAI's widely-used open-source tool, Whisper Large v3, sits in the middle of the pack with a 4.2% error rate. Lagging behind are Alibaba's Qwen3 ASR Flash at 5.9%, Amazon's Nova 2 Omni at 6.0%, and Rev AI at 6.1%.

When it comes to the AA-AgentTalk test, which evaluates speech directed at voice assistants, ElevenLabs' Scribe v2 and Google's Gemini 3 Pro once again lead with error rates of 1.6% and 1.7%, respectively. AssemblyAI's Universal-3 Pro follows in third place with a 2.3% error rate. These results highlight the dominance of ElevenLabs and Google in both general speech-to-text and voice assistant-specific benchmarks.