Transformer‐Based Multilingual Automatic Speech Recognition (ASR) Model for Dravidian Languages

Publication Type : Book Chapter

Publisher : Scrivener Publishing LLC

Source : Automatic Speech Recognition and Translation for Low Resource Languages, 259-273, 2024

Url : https://onlinelibrary.wiley.com/doi/10.1002/9781394214624.ch13

Campus : Coimbatore

School : School of Artificial Intelligence

Center : Computational Engineering and Networking

Year : 2024

Abstract : India has a rich linguistic diversity with over 1600 Indigenous languages, many of which are experiencing a cultural decline due to limited accessibility, awareness, and information. In recent years, various deep learning techniques such as recurrent neural networks (RNN) and hidden Markov models (HMM) have been applied to low-resource languages for automatic speech recognition (ASR), but their performance is limited by the availability of quality datasets. Moreover, scarcity of high-quality data is a huge detriment for Indian languages. Transformers, on the other hand, have emerged as a popular and effective deep learning model for ASR due to their pre-trained parameters and fine-tuning capabilities. OpenAI's Whisper model is an ASR system trained on a vast amount of multilingual and multitask data collected from the web. Due to its capabilities and functionalities, it is considered the new benchmark for ASR. While the Whisper model does recognize some Indian languages, there is no specific training for Dravidian languages. However, these languages are of particular interest due to their common roots with other Indian languages and their unique challenges in being spoken natively by a low-level resource population. The aim of this chapter is to develop a multilingual ASR model for Dravidian languages such as Tamil and Telugu by leveraging the Whisper model and incorporating various speech performance metrics, including word error rate (WER). We obtained 61.2% WER for Telugu and 27.2% WER for Tamil using their minimal configuration, which are significantly better than other existing models.

Cite this Research Publication : Chowdary, D.E., Ganesan, R., Dabbara, H., Jyothish Lal, G., Premjith, B., Transformer-Based Multilingual Automatic Speech Recognition (ASR) Model for Dravidian Languages, (2024) Automatic Speech Recognition and Translation for Low Resource Languages, pp. 259-273., DOI: 10.1002/9781394214624.ch13

About Amrita Vishwa Vidyapeetham

Rankings

Accreditation

Governance

Chancellor

Leadership

Press Media

Newsletters

Amritapuri
Campus

Amaravati
Campus

Bengaluru
Campus

Chennai
Campus

Coimbatore
Campus

Faridabad
Campus

Kochi
Campus

Mysuru
Campus

Nagercoil
Campus

Research

Centers

Patents

Publication