Back close

A Comparative Study of Text-to-Speech (TTS) Models and Vocoder Combinations for High-Quality Synthesized Speech

Publication Type : Conference Paper

Publisher : IEEE

Source : 2023 7th International Conference on Electronics, Communication and Aerospace Technology (ICECA), Coimbatore, India, 2023, pp. 311-315

Url : https://ieeexplore.ieee.org/document/10395349

Campus : Coimbatore

School : School of Artificial Intelligence

Year : 2023

Abstract : Deep neural networks and machine learning have significantly improved the quality and naturalness of TTS, a method for converting written text to spoken text. The present work proposes a comprehensive comparison of multiple Text-to-Speech (TTS) models and vocoders. The primary objective is to identify the most effective TTS model-vocoder combination and comprehend their advantages and disadvantages. We conduct rigorous evaluations of various TTS model-vocoder pairings to achieve this, utilizing the Lj-Speech-en dataset. We evaluated the naturalness of the synthesized speech by employing subjective Mean Opinion Score (MOS) assessments from 40 listeners. Experimental results demonstrate that the FastSpeech2 and MB-MelGAN combination outperforms all other configurations, yielding remarkably high-quality audio with an MOS score of 4.3595.

Cite this Research Publication : P. V. Reddy, D. Rohith, S. V. S. Dhanush, T. S. Ganesh Kumar, J. L. G and B. Premjith, "A Comparative Study of Text-to-Speech (TTS) Models and Vocoder Combinations for High-Quality Synthesized Speech," 2023 7th International Conference on Electronics, Communication and Aerospace Technology (ICECA), Coimbatore, India, 2023, pp. 311-315

Admissions Apply Now