Publication Type : Conference Paper
Publisher : IEEE
Source : In 2022 IEEE 19th India Council International Conference (INDICON) (pp. 1-6). IEEE
Url : https://ieeexplore.ieee.org/document/10039985
Campus : Bengaluru
School : School of Engineering
Department : Electronics and Communication
Year : 2022
Abstract : For a nation such as India with more than 20 formally identified regional languages, it is essential to overcome language barriers to ensure smooth communication. Researchers have been working on the creation of intelligent engines capable of bridging the gap between different natural languages through machine perception. This paper proposed and describes an Automatic Speech Recognition (ASR) system for the language, Sourashtra which is built using the Kaldi toolkit. A custom speech dataset was created with the help of native Sourashtra speakers. Due to the absence of phoneme representations for this language currently, the Devanagari script was used for transliteration and language modelling, following the ILSL12 convention. With a total of 2000 word utterances, we achieve a word error rate (WER) of 5.5 and a CER of 0.2, using a GMM-HMM based acoustic model trained for monophones.
Cite this Research Publication : Vancha, P., Nagarajan, H., Inakollu, V. S., Gupta, D., & Vekkot, S. (2022, November). Word-Level Speech Dataset Creation for Sourashtra and Recognition System Using Kaldi. In 2022 IEEE 19th India Council International Conference (INDICON) (pp. 1-6). IEEE