NeuCodec

NeuCodec is a neural audio codec developed by Neuphonic, designed for efficient speech tokenization and high-quality audio compression at relatively low bitrates.

Technical Specifications[edit | edit source]

Bitrate: 0.8 kbps
Output sample rate: 24 kHz
Frame rate: 50 Hz
Quantization: Finite Scalar Quantization (FSQ) with a single codebook

Architecture[edit | edit source]

NeuCodec is largely based on extending the work of X-Codec 2.0. It employs a dual-encoder approach, using both audio (BigCodec) and semantic (Wav2Vec2-BERT) encoders. The FSQ-based design produces a single quantized vector output, making it well-suited for downstream Speech Language Model (SpeechLM) training.

Features[edit | edit source]

Compresses and reconstructs audio with near-inaudible reconstruction loss
Upsamples from 16 kHz to 24 kHz
Commercial use permitted
Pre-encoded datasets available (Emilia-YODAS compressed from 1.7 TB to 41 GB)

Applications[edit | edit source]

NeuCodec serves as the audio codec for NeuTTS Air, Neuphonic's on-device text-to-speech model with voice cloning capabilities. It's intended for researchers and developers building text-to-speech systems who need efficient speech tokenization without developing their own codec.

Availability[edit | edit source]

Available on Hugging Face and GitHub under the neuphonic/neucodec repository, installable via pip.

NeuCodec

Contents

Technical Specifications[edit | edit source]

Architecture[edit | edit source]

Features[edit | edit source]

Applications[edit | edit source]

Availability[edit | edit source]

Navigation menu

NeuCodec

Technical Specifications[edit | edit source]

Architecture[edit | edit source]

Features[edit | edit source]

Applications[edit | edit source]

Availability[edit | edit source]

Navigation menu

Search