View source for VITS

== End-to-End Capabilities ==

A significant contribution of VITS2 was reducing dependence on phoneme conversion. Using character error rate (CER) evaluation with automatic speech recognition, VITS2 achieved 4.01% CER when using normalized text input compared to 3.92% with phoneme sequences, demonstrating the possibility of fully end-to-end training without explicit phoneme preprocessing.