Orpheus TTS: Difference between revisions

Orpheus TTS ;
Model Information
Developer:	Canopy Labs
Release date:	March 18, 2025
Latest version:	3B 0.1
Architecture:	LLM-Based
Parameters:	3 billion
Training data:	100k+ hours (English)
Capabilities
Languages:	English (multilingual in preview)
Voices:	8 distinct voices
Voice cloning:	Yes (zero-shot)
Emotion control:	Yes (tag-based)
Streaming:	Yes
Latency:	~200ms
Availability
License:	Apache License 2.0
Open source:	Yes
Repository:	GitHub
Model weights:	Hugging Face
Demo:	HF Spaces
Website:	Canopy Labs

VisualWikitext

Latest revision as of 16:06, 20 September 2025

Orpheus TTS is an open-source text-to-speech (TTS) system developed by Canopy Labs and released in March 2025. Built on the Llama-3.2-3B architecture, it uses a novel approach of using large-language-models with audio tokens instead of traditional TTS-specific architectures.

Development and Release[edit | edit source]

Orpheus TTS was developed by Canopy Labs, an artificial intelligence startup founded with the stated mission of creating "digital humans that are indistinguishable from real humans."^[1] The code and model weights were publicly released on March 18, 2025, under the Apache License 2.0, making both the model weights and training code freely available.^[2]

Technical Architecture[edit | edit source]

Orpheus TTS differs from conventional text-to-speech systems by using a modified Meta's Llama-3.2-3B language model as its foundation. It takes in a text prompt and generates audio tokens using the 24 kHz variant of the SNAC audio tokenizer. The system was trained on a dataset comprising over 100,000 hours of English speech data combined with billions of tokens of textual QA pairs, a hybrid approach designed to maintain linguistic understanding while adding speech synthesis capabilities.

The model supports streaming out-of-the-box and can achieve ~200 milliseconds of streaming latency.^[3]

Performance Claims and Evaluation[edit | edit source]

Canopy Labs claims that Orpheus TTS delivers "natural intonation, emotion, and rhythm that is superior to SOTA closed source models," positioning it as competitive with established commercial systems such as ElevenLabs and other proprietary text-to-speech services.^[4]

However, these performance assertions are based primarily on internal evaluations and subjective assessments rather than standardized benchmarks or peer-reviewed studies. The lack of comprehensive comparative analysis with established TTS systems has led to some skepticism within the research community about the extent of its claimed superiority.

Model Variants[edit | edit source]

The model is available in three variants:

Pretrained model: The base model trained on the full dataset, suitable for research and custom fine-tuning
Fine-tuned production model: An optimized version designed for everyday TTS applications, fine-tuned on several voices
Multilingual model: A family of fine-tuned models with support for other languages

External Links[edit | edit source]

[1] ttps://canopylabs.ai/

[2] ttps://github.com/canopyai/Orpheus-TTS

[3] ttps://www.baseten.co/blog/canopy-labs-selects-baseten-as-preferred-inference-provider-for-orpheus-tts-model/

[4] ttps://huggingface.co/canopylabs/orpheus-3b-0.1-ft

[1]

[2]

[3]

[4]

@@ Line 1: / Line 1: @@
+{{Infobox TTS model
+| name = Orpheus TTS
+| developer = [[Canopy Labs]]
+| release_date = March 18, 2025
+| latest_version = 3B 0.1
+| architecture = [[LLM-Based]]
+| parameters = 3 billion
+| training_data = 100k+ hours (English)
+| languages = English (multilingual in preview)
+| voices = 8 distinct voices
+| voice_cloning = Yes (zero-shot)
+| emotion_control = Yes (tag-based)
+| streaming = Yes
+| latency = ~200ms
+| license = [[Apache License 2.0]]
+| open_source = Yes
+| code_repository = [https://github.com/canopyai/Orpheus-TTS GitHub]
+| model_weights = [https://huggingface.co/canopylabs/orpheus-3b-0.1-ft Hugging Face]
+| demo = [https://huggingface.co/spaces/MohamedRashad/Orpheus-TTS HF Spaces]
+| website = [https://canopylabs.ai/releases/towards_human_sounding_tts Canopy Labs]
+}}
 '''Orpheus TTS''' is an open-source [[text-to-speech]] (TTS) system developed by Canopy Labs and released in March 2025. Built on the [[Llama (language model)|Llama-3.2-3B]] architecture, it uses a novel approach of using large-language-models with audio tokens instead of traditional TTS-specific architectures.
@@ Line 24: / Line 46: @@
 * '''Fine-tuned production model''': An optimized version designed for everyday TTS applications, fine-tuned on several voices
 * '''Multilingual model''': A family of fine-tuned models with support for other languages
-== References ==
-{{reflist}}
 == External Links ==

Orpheus TTS: Difference between revisions

Latest revision as of 16:06, 20 September 2025

Contents

Development and Release[edit | edit source]

Technical Architecture[edit | edit source]

Performance Claims and Evaluation[edit | edit source]

Model Variants[edit | edit source]

External Links[edit | edit source]

Navigation menu

Orpheus TTS: Difference between revisions

Latest revision as of 16:06, 20 September 2025

Development and Release[edit | edit source]

Technical Architecture[edit | edit source]

Performance Claims and Evaluation[edit | edit source]

Model Variants[edit | edit source]

External Links[edit | edit source]

Navigation menu

Search