Editing
Orpheus TTS
Jump to navigation
Jump to search
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
{{Infobox TTS model | name = Orpheus TTS | developer = [[Canopy Labs]] | release_date = March 18, 2025 | latest_version = 3B 0.1 | architecture = [[LLM-Based]] | parameters = 3 billion | training_data = 100k+ hours (English) | languages = English (multilingual in preview) | voices = 8 distinct voices | voice_cloning = Yes (zero-shot) | emotion_control = Yes (tag-based) | streaming = Yes | latency = ~200ms | license = [[Apache License 2.0]] | open_source = Yes | code_repository = [https://github.com/canopyai/Orpheus-TTS GitHub] | model_weights = [https://huggingface.co/canopylabs/orpheus-3b-0.1-ft Hugging Face] | demo = [https://huggingface.co/spaces/MohamedRashad/Orpheus-TTS HF Spaces] | website = [https://canopylabs.ai/releases/towards_human_sounding_tts Canopy Labs] }} '''Orpheus TTS''' is an open-source [[text-to-speech]] (TTS) system developed by Canopy Labs and released in March 2025. Built on the [[Llama (language model)|Llama-3.2-3B]] architecture, it uses a novel approach of using large-language-models with audio tokens instead of traditional TTS-specific architectures. == Development and Release == Orpheus TTS was developed by Canopy Labs, an artificial intelligence startup founded with the stated mission of creating "digital humans that are indistinguishable from real humans."<ref>https://canopylabs.ai/</ref> The code and model weights were publicly released on March 18, 2025, under the [[Apache License]] 2.0, making both the model weights and training code freely available.<ref>https://github.com/canopyai/Orpheus-TTS</ref> == Technical Architecture == Orpheus TTS differs from conventional text-to-speech systems by using a modified Meta's Llama-3.2-3B language model as its foundation. It takes in a text prompt and generates audio tokens using the 24 kHz variant of the [[SNAC]] audio tokenizer. The system was trained on a dataset comprising over 100,000 hours of English speech data combined with billions of tokens of textual QA pairs, a hybrid approach designed to maintain linguistic understanding while adding speech synthesis capabilities. The model supports streaming out-of-the-box and can achieve ~200 milliseconds of streaming latency.<ref>https://www.baseten.co/blog/canopy-labs-selects-baseten-as-preferred-inference-provider-for-orpheus-tts-model/</ref> == Performance Claims and Evaluation == Canopy Labs claims that Orpheus TTS delivers "natural intonation, emotion, and rhythm that is superior to SOTA closed source models," positioning it as competitive with established commercial systems such as [[ElevenLabs]] and other proprietary text-to-speech services.<ref>https://huggingface.co/canopylabs/orpheus-3b-0.1-ft</ref> However, these performance assertions are based primarily on internal evaluations and subjective assessments rather than standardized benchmarks or peer-reviewed studies. The lack of comprehensive comparative analysis with established TTS systems has led to some skepticism within the research community about the extent of its claimed superiority. == Model Variants == The model is available in three variants: * '''Pretrained model''': The base model trained on the full dataset, suitable for research and custom fine-tuning * '''Fine-tuned production model''': An optimized version designed for everyday TTS applications, fine-tuned on several voices * '''Multilingual model''': A family of fine-tuned models with support for other languages == External Links == * [https://github.com/canopyai/Orpheus-TTS Official Orpheus TTS repository] * [https://huggingface.co/canopylabs/orpheus-3b-0.1-ft Model on Hugging Face] * [https://www.baseten.co/library/orpheus-tts/ Orpheus TTS on Baseten] [[Category:Speech synthesis]] [[Category:Open-source software]] [[Category:Artificial intelligence]] [[Category:Large language models]] <references />
Summary:
Please note that all contributions to TTS Wiki are considered to be released under the Creative Commons Attribution 4.0 (see
Project:Copyrights
for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource.
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Template used on this page:
Template:Infobox TTS model
(
view source
) (protected)
Navigation menu
Personal tools
Not logged in
Talk
Contributions
Create account
Log in
Namespaces
Page
Discussion
English
Views
Read
Edit
Edit source
View history
More
Search
Navigation
Main page
Recent changes
Random page
Help about MediaWiki
Tools
What links here
Related changes
Special pages
Page information