Chatterbox

Chatterbox ;
Model Information
Developer:	Resemble AI
Release date:	May 2025
Latest version:	Multilingual 2.0
Architecture:	CosyVoice 2.0-based
Parameters:	500 million
Training data:	500,000 hours cleaned data
Capabilities
Languages:	23 languages (multilingual version)
Voices:	Zero-shot voice cloning
Voice cloning:	Yes (5-second reference)
Emotion control:	Yes (exaggeration parameter)
Streaming:	Yes
Latency:	Sub-200ms
Availability
License:	MIT License
Open source:	Yes
Repository:	GitHub
Model weights:	Hugging Face
Demo:	HF Spaces
Website:	resemble.ai/chatterbox

Chatterbox is an open-source text-to-speech (TTS) model developed by Resemble AI and released in May 2025. Built on a modified Llama architecture with 500M parameters, it is marketed as the first open-source TTS model to include controllable emotion exaggeration and has gained attention for claiming to outperform established commercial systems in user preference evaluations. It is built on the CosyVoice 2.0 architecture.

Development and Release[edit | edit source]

Chatterbox was developed by a three-person team at Resemble AI, a voice technology company founded by Zohaib Ahmed and Saqib Muhammad.^[1] The initial English-only version was released in May 2025 under the MIT License, followed by a multilingual version supporting 23 languages in September 2025.^[2]

The project quickly gained popularity in the open-source community, accumulating over 1 million downloads on Hugging Face and more than 11,000 stars on GitHub within weeks of release.^[3]

Technical Architecture[edit | edit source]

Chatterbox utilizes a 500-million parameter model based on a CosyVoice-style modified Llama architecture, significantly smaller than many contemporary TTS systems. The model was trained on approximately 500,000 hours of cleaned audio data and employs what the developers term "alignment-informed inference" for improved stability during generation.

Key technical features include:

Zero-shot voice cloning: Ability to clone voices using as little as 5 seconds of reference audio
Emotion exaggeration control: A novel parameter allowing users to adjust emotional intensity from monotone to dramatically expressive
Fast inference: Sub-200ms latency for real-time applications
Multilingual support: The updated version supports 23 languages including Arabic, Chinese, Hindi, and major European languages

Performance Claims and Evaluation[edit | edit source]

Resemble AI conducted a comparative evaluation through Podonos, a third-party evaluation service, testing Chatterbox against ElevenLabs, a leading commercial TTS system. In blind A/B testing, 63.75% of evaluators reportedly preferred Chatterbox's output over ElevenLabs.^[4]^[5]

However, these results should be interpreted with caution, as the evaluation was limited in scope and conducted by a single third-party service. The testing methodology, sample size, and demographic composition of evaluators have not been independently verified. Additionally, the comparison was limited to a single competitor rather than a comprehensive benchmark against multiple state-of-the-art systems.

Commercial and Research Impact[edit | edit source]

The release of Chatterbox has been significant for the open-source TTS community, representing one of the first production-grade systems to be freely available under a permissive license. This has enabled developers to integrate high-quality TTS capabilities into applications without licensing costs or vendor dependencies.

The system has found applications in various domains including:

Audiobook generation and voice narration
Game development for non-player character dialogue
Educational content creation
Accessibility tools for visually impaired users
Research and development in speech synthesis

Resemble AI also offers a commercial "Pro" version with enhanced features, service-level agreements, and custom fine-tuning capabilities for enterprise customers requiring guaranteed performance and support. This version is available through their inference partners, such as FAL.

External Links[edit | edit source]

[1] ttps://www.digitalocean.com/community/tutorials/resemble-chatterbox-tts-text-to-speech

[2] ttps://www.resemble.ai/introducing-chatterbox-multilingual-open-source-tts-for-23-languages/

[multilingual-3] ttps://www.resemble.ai/introducing-chatterbox-multilingual-open-source-tts-for-23-languages/

[4] ttps://www.podonos.com/blog/chatterbox

[5] ttps://www.resemble.ai/chatterbox/

[1]

[2]

[3]

[4]

[5]

Chatterbox

Contents

Development and Release[edit | edit source]

Technical Architecture[edit | edit source]

Performance Claims and Evaluation[edit | edit source]

Commercial and Research Impact[edit | edit source]

External Links[edit | edit source]

Navigation menu

Chatterbox

Development and Release[edit | edit source]

Technical Architecture[edit | edit source]

Performance Claims and Evaluation[edit | edit source]

Commercial and Research Impact[edit | edit source]

External Links[edit | edit source]

Navigation menu

Search