Publication Type : Book Chapter
Publisher : SpringerLink
Source : In International Conference on Innovations in Bio-Inspired Computing and Applications
Url : https://link.springer.com/chapter/10.1007/978-3-031-27499-2_8
Campus : Coimbatore
School : School of Artificial Intelligence - Coimbatore
Center : Center for Computational Engineering and Networking
Year : 2022
Abstract : The importance of speech emotion recognition has increased as a result of the acceptance of intelligent conversational assistant services. The communication between humans and machines may be made better via emotion recognition and analysis. We propose the application of attention based deep learning techniques to process and recognize speech emotions. In this paper we look at two major approaches CNN-LSTM and Mel Spectrogram-Vision Transformer based models and is compared over to the existing benchmarks. The experimental results roots for the feature extraction strategy of deep learning based approaches, eliminating the need of handpicking the features for traditional machine learning (ML) classifiers present in the current literature. A comparative study and evaluation between CNN-LSTM and Vision Transformers (ViT) have been evaluated and established from the experimental results. Both the models performed similarly with CNN-LSTM giving an accuracy of 88.50% when compared to the accuracy of 85.36% by ViT surpassing the existing benchmarks and providing the scope of study of attention and image processing based learning for speech emotion recognition.
Cite this Research Publication : Kumar, CS Ayush, Advaith Das Maharana, Srinath Murali Krishnan, Sannidhi Sri Sai Hanuma, G. Jyothish Lal, and Vinayakumar Ravi. "Speech Emotion Recognition Using CNN-LSTM and Vision Transformer." In International Conference on Innovations in Bio-Inspired Computing and Applications, pp. 86-97. Cham: Springer Nature Switzerland, 2022.