NeuCodec
NeuCodec is a neural audio codec developed by Neuphonic, designed for efficient speech tokenization and high-quality audio compression at relatively low bitrates.
Technical Specifications[edit | edit source]
- Bitrate: 0.8 kbps
- Output sample rate: 24 kHz
- Frame rate: 50 Hz
- Quantization: Finite Scalar Quantization (FSQ) with a single codebook
Architecture[edit | edit source]
NeuCodec is largely based on extending the work of X-Codec 2.0. It employs a dual-encoder approach, using both audio (BigCodec) and semantic (Wav2Vec2-BERT) encoders. The FSQ-based design produces a single quantized vector output, making it well-suited for downstream Speech Language Model (SpeechLM) training.
Features[edit | edit source]
- Compresses and reconstructs audio with near-inaudible reconstruction loss
- Upsamples from 16 kHz to 24 kHz
- Commercial use permitted
- Pre-encoded datasets available (Emilia-YODAS compressed from 1.7 TB to 41 GB)
Applications[edit | edit source]
NeuCodec serves as the audio codec for NeuTTS Air, Neuphonic's on-device text-to-speech model with voice cloning capabilities. It's intended for researchers and developers building text-to-speech systems who need efficient speech tokenization without developing their own codec.
Availability[edit | edit source]
Available on Hugging Face and GitHub under the neuphonic/neucodec repository, installable via pip.