Publication Type : Journal Article
Publisher : Applied Acoustics,
Source : Applied Acoustics, Volume 170, p.107519 (2020)
Url : https://www.sciencedirect.com/science/article/pii/S0003682X2030623X
Keywords : arousal, Categorical, Cross-validation, data augmentation, emotion, Mel frequency, Mixed-lingual, Multilingual, SMOTE, Valence
Campus : Bengaluru
School : School of Engineering
Department : Electronics and Communication
Year : 2020
Abstract : In the past decade, research for improving man–machine communication has focused on emotion recognition using audio cues. Several effective monolingual, multilingual, and cross-corpus speech emotion recognition (SER) systems have been developed; however, they are limited to recognizing emotions from databases of monolingual discourse, primarily in either categorical or dimensional emotion space. For multilingual countries and federations such as India, Russia, and the European Union, these limitations can be problematic. Furthermore, in an environment of mixed diversified languages, the performance of existing models is unclear. To address these issues, we propose an innovative mixed-lingual SER system that considers five diverse languages, including dialect variability. Mixed-lingual corpora are developed from available standard speech emotion databases. Furthermore, a compact feature set having a unique set of speech feature functionals with a distinctive set of enhanced perceptual features and modified H-coefficients is proposed. Against existing large feature sets, the proposed compact feature set is robust and effective to perform the dual task of significantly recognizing different emotions from multilingual SER systems also, along with the mixed-lingual SER systems in both emotion spaces of categorical and dimensional. In the proposed SER system, to overcome the skewness of SER system performance for recognizing certain emotions, a data augmentation method is then incorporated. Furthermore, the proposed SER system is designed to efficiently recognize even the extreme emotions of boredom, disgust, sadness, and surprise in both emotion spaces. The proposed SER system is compared to existing SER systems, and the comparison results demonstrate that the proposed system outperformed existing systems.
Cite this Research Publication : S. Lalitha, Dr. Deepa Gupta, Zakariah, M., and Alotaibi, Y. Ajami, “Investigation of multilingual and mixed-lingual emotion recognition using enhanced cues with data augmentation”, Applied Acoustics, vol. 170, p. 107519, 2020.