View source for Chatterbox

{{Infobox TTS model
| name = Chatterbox
| developer = [[Resemble AI]]
| release_date = May 2025
| latest_version = Multilingual 2.0
| architecture = [[CosyVoice 2.0]]-based
| parameters = 500 million
| training_data = 500,000 hours cleaned data
| languages = 23 languages (multilingual version)
| voices = Zero-shot voice cloning
| voice_cloning = Yes (5-second reference)
| emotion_control = Yes (exaggeration parameter)
| streaming = Yes
| latency = Sub-200ms
| license = [[MIT License]]
| open_source = Yes
| code_repository = [https://github.com/resemble-ai/chatterbox GitHub]
| model_weights = [https://huggingface.co/ResembleAI/chatterbox Hugging Face]
| demo = [https://huggingface.co/spaces/ResembleAI/Chatterbox HF Spaces]
| website = [https://www.resemble.ai/chatterbox/ resemble.ai/chatterbox]
}}
'''Chatterbox''' is an open-source [[text-to-speech]] (TTS) model developed by [[Resemble AI]] and released in May 2025. Built on a modified [[Llama]] architecture with 500M parameters, it is marketed as the first open-source TTS model to include controllable emotion exaggeration and has gained attention for claiming to outperform established commercial systems in user preference evaluations. It is built on the [[CosyVoice|CosyVoice 2.0]] architecture.

== Development and Release ==

Chatterbox was developed by a three-person team at Resemble AI, a voice technology company founded by Zohaib Ahmed and Saqib Muhammad.<ref>https://www.digitalocean.com/community/tutorials/resemble-chatterbox-tts-text-to-speech</ref> The initial English-only version was released in May 2025 under the [[MIT License]], followed by a multilingual version supporting 23 languages in September 2025.<ref>https://www.resemble.ai/introducing-chatterbox-multilingual-open-source-tts-for-23-languages/</ref>

The project quickly gained popularity in the open-source community, accumulating over 1 million downloads on [[Hugging Face]] and more than 11,000 stars on [[GitHub]] within weeks of release.<ref name="multilingual">https://www.resemble.ai/introducing-chatterbox-multilingual-open-source-tts-for-23-languages/</ref>

== Technical Architecture ==

Chatterbox utilizes a 500-million parameter model based on a CosyVoice-style modified Llama architecture, significantly smaller than many contemporary TTS systems. The model was trained on approximately 500,000 hours of cleaned audio data and employs what the developers term "alignment-informed inference" for improved stability during generation.

Key technical features include:

* '''Zero-shot voice cloning''': Ability to clone voices using as little as 5 seconds of reference audio
* '''Emotion exaggeration control''': A novel parameter allowing users to adjust emotional intensity from monotone to dramatically expressive
* '''Fast inference''': Sub-200ms latency for real-time applications
* '''Multilingual support''': The updated version supports 23 languages including Arabic, Chinese, Hindi, and major European languages

== Performance Claims and Evaluation ==

Resemble AI conducted a comparative evaluation through [[Podonos]], a third-party evaluation service, testing Chatterbox against [[ElevenLabs]], a leading commercial TTS system. In blind A/B testing, 63.75% of evaluators reportedly preferred Chatterbox's output over ElevenLabs.<ref>https://www.podonos.com/blog/chatterbox</ref><ref>https://www.resemble.ai/chatterbox/</ref>

However, these results should be interpreted with caution, as the evaluation was limited in scope and conducted by a single third-party service. The testing methodology, sample size, and demographic composition of evaluators have not been independently verified. Additionally, the comparison was limited to a single competitor rather than a comprehensive benchmark against multiple state-of-the-art systems.

== Commercial and Research Impact ==

The release of Chatterbox has been significant for the open-source TTS community, representing one of the first production-grade systems to be freely available under a permissive license. This has enabled developers to integrate high-quality TTS capabilities into applications without licensing costs or vendor dependencies.

The system has found applications in various domains including:

* Audiobook generation and voice narration
* Game development for non-player character dialogue
* Educational content creation
* Accessibility tools for visually impaired users
* Research and development in speech synthesis

Resemble AI also offers a commercial "Pro" version with enhanced features, service-level agreements, and custom fine-tuning capabilities for enterprise customers requiring guaranteed performance and support. This version is available through their inference partners, such as FAL.

== External Links ==

* [https://github.com/resemble-ai/chatterbox Official Chatterbox repository]
* [https://huggingface.co/ResembleAI/chatterbox Model on Hugging Face]
* [https://huggingface.co/spaces/ResembleAI/Chatterbox Interactive demo]
* [https://resemble-ai.github.io/chatterbox_demopage/ Demo page with audio samples]

[[Category:Speech synthesis]]
[[Category:Open-source software]]
[[Category:Artificial intelligence]]
[[Category:Voice technology]]
[[Category:MIT License software]]