Publication Type : Conference Paper
Thematic Areas : Center for Computational Engineering and Networking (CEN)
Campus : Coimbatore
School : School of Artificial Intelligence, School of Artificial Intelligence - Coimbatore
Department : Center for Computational Engineering and Networking (CEN)
Year : 2022
Abstract : The two key components of Automatic Speech Recognition (ASR) and Text-to-Speech (TTS) systems are language modeling and acoustic modeling. The language model generates a lexicon, which is a pronunciation dictionary. A lexicon can be created using a variety of approaches. For low-resource languages, rule-based methods are typically employed to build the lexicon. However, because the corpus is often tiny, this methodology does not account for all possible pronunciation variances. As a result, low-resource languages like Malayalam require a method for developing a comprehensive lexicon as the corpus grows. In this work, we explored deep learning based encoder-decoder models for grapheme-to-phoneme (G2P) conversion in Malayalam. Long Short-Term Memory (LSTM) and Bidirectional Long Short-Term Memory (BiLSTM) encoder models with varying embedding dimensions were used to create the encoder model. The performance of the deep learning models used for G2P conversion was measured using the Word Error Rate (WER) and Phoneme Error Rate (PER). With 1024 embedding dimensions, the encoder using the BiLSTM model had the maximum accuracy of 98.04% and the lowest PER of 2.57% at the phoneme level, and the highest accuracy of 90.58% and the lowest WER of 9.42% at the word level.
Cite this Research Publication : Priyamvada, R., Govind, D., Menon, V.K., Premjith, B., Soman, K.P., "Grapheme to Phoneme Conversion for Malayalam Speech Using Encoder-Decoder Architecture," (2022) Smart Innovation, Systems and Technologies, 266, pp. 41-49., DOI: 10.1007/978-981-16-6624-7_5