View source for Orpheus TTS

{{Infobox TTS model
| name = Orpheus TTS
| developer = [[Canopy Labs]]
| release_date = March 18, 2025
| latest_version = 3B 0.1
| architecture = [[LLM-Based]]
| parameters = 3 billion
| training_data = 100k+ hours (English)
| languages = English (multilingual in preview)
| voices = 8 distinct voices
| voice_cloning = Yes (zero-shot)
| emotion_control = Yes (tag-based)
| streaming = Yes
| latency = ~200ms
| license = [[Apache License 2.0]]
| open_source = Yes
| code_repository = [https://github.com/canopyai/Orpheus-TTS GitHub]
| model_weights = [https://huggingface.co/canopylabs/orpheus-3b-0.1-ft Hugging Face]
| demo = [https://huggingface.co/spaces/MohamedRashad/Orpheus-TTS HF Spaces]
| website = [https://canopylabs.ai/releases/towards_human_sounding_tts Canopy Labs]
}}

'''Orpheus TTS''' is an open-source [[text-to-speech]] (TTS) system developed by Canopy Labs and released in March 2025. Built on the [[Llama (language model)|Llama-3.2-3B]] architecture, it uses a novel approach of using large-language-models with audio tokens instead of traditional TTS-specific architectures.

== Development and Release ==

Orpheus TTS was developed by Canopy Labs, an artificial intelligence startup founded with the stated mission of creating "digital humans that are indistinguishable from real humans."<ref>https://canopylabs.ai/</ref> The code and model weights were publicly released on March 18, 2025, under the [[Apache License]] 2.0, making both the model weights and training code freely available.<ref>https://github.com/canopyai/Orpheus-TTS</ref>

== Technical Architecture ==

Orpheus TTS differs from conventional text-to-speech systems by using a modified Meta's Llama-3.2-3B language model as its foundation. It takes in a text prompt and generates audio tokens using the 24 kHz variant of the [[SNAC]] audio tokenizer. The system was trained on a dataset comprising over 100,000 hours of English speech data combined with billions of tokens of textual QA pairs, a hybrid approach designed to maintain linguistic understanding while adding speech synthesis capabilities.

The model supports streaming out-of-the-box and can achieve ~200 milliseconds of streaming latency.<ref>https://www.baseten.co/blog/canopy-labs-selects-baseten-as-preferred-inference-provider-for-orpheus-tts-model/</ref>

== Performance Claims and Evaluation ==

Canopy Labs claims that Orpheus TTS delivers "natural intonation, emotion, and rhythm that is superior to SOTA closed source models," positioning it as competitive with established commercial systems such as [[ElevenLabs]] and other proprietary text-to-speech services.<ref>https://huggingface.co/canopylabs/orpheus-3b-0.1-ft</ref>

However, these performance assertions are based primarily on internal evaluations and subjective assessments rather than standardized benchmarks or peer-reviewed studies. The lack of comprehensive comparative analysis with established TTS systems has led to some skepticism within the research community about the extent of its claimed superiority.

== Model Variants ==

The model is available in three variants:

* '''Pretrained model''': The base model trained on the full dataset, suitable for research and custom fine-tuning
* '''Fine-tuned production model''': An optimized version designed for everyday TTS applications, fine-tuned on several voices
* '''Multilingual model''': A family of fine-tuned models with support for other languages

== External Links ==

* [https://github.com/canopyai/Orpheus-TTS Official Orpheus TTS repository]
* [https://huggingface.co/canopylabs/orpheus-3b-0.1-ft Model on Hugging Face]
* [https://www.baseten.co/library/orpheus-tts/ Orpheus TTS on Baseten]

[[Category:Speech synthesis]]
[[Category:Open-source software]]
[[Category:Artificial intelligence]]
[[Category:Large language models]]
<references />