View source for VibeVoice

== Performance and Evaluation ==

In comparative evaluations against contemporary TTS systems including [[ElevenLabs]], [[Google]]'s Gemini 2.5 Pro TTS, and others, VibeVoice reportedly achieved superior scores in subjective metrics of realism, richness, and user preference. The model also demonstrated competitive [[word error rate]]s when evaluated using speech recognition systems.<ref>https://huggingface.co/microsoft/VibeVoice-1.5B</ref>

However, these evaluations were conducted on a limited test set of eight conversational transcripts totaling approximately one hour of audio, raising questions about the generalizability of the results to broader use cases.