Publication Type : Journal Article
Publisher : Journal of Computational and Theoretical Nanoscience
Source : Journal of Computational and Theoretical Nanoscience, Volume 17, Number 1", publication date ="2020-01-01T00:00:00, p.316-321 (2020)
Url : https://www.ingentaconnect.com/content/asp/jctn/2020/00000017/00000001/art00049
Campus : Bengaluru
School : School of Engineering
Department : Electronics and Communication
Year : 2020
Abstract : The paper focuses on usage of deep neural networks for converting a persons voice to another persons voice, analogous to a mimic. The work in this paper introduces the concept of neural networks and deploys multi-layer deep neural networks for building a framework for voice conversion. The spectral Mel-Frequency Cepstral Coefficients (MFCCs) are converted using a 10-layer deep network while fundamental frequency (F 0) conversion is accomplished by logarithmic Gaussian normalized transformation. MFCCs are subjected to inverse cepstral filtering while changes in F 0 are incorporated using Pitch Synchronous OverLap Add (PSOLA) algorithm for re-synthesis. The results obtained are compared using Mel Cepstral Distortion (MCD) for objective evaluation while ABX-listening test is conducted for subjective assessment. Maximum improvement in MCD of 13.87% is obtained for female-to-male conversion while ABX-listening test indicates that female-to-male is closest to target with an agreement of 76.2%. The method achieves reasonably good performance compared to state-of-the-art using optimal resources and avoids requirement of highly complex computations.
Cite this Research Publication : V. Naveena, Susmitha Vekkot, and K. Priya, J., “Voice Conversion System Based on Deep Neural Networks”, Journal of Computational and Theoretical Nanoscience, vol. 17, pp. 316-321, 2020.