Publication Type : Journal Article
Publisher : ACM Transactions on Asian and Low-Resource Language Information Processing
Source : ACM Trans. Asian Low-Resour. Lang. Inf. Process., Association for Computing Machinery, Volume 20, Number 6, New York, NY, USA (2021)
Url : https://doi.org/10.1145/3457976
Keywords : bidirectional RNN, Conditional random field, gated recurrent unit, long short-term memory networks, Morphological generation, Recurrent neural networks, stacked RNN
Campus : Coimbatore
School : School of Artificial Intelligence, School of Artificial Intelligence - Coimbatore, School of Engineering
Center : Computational Engineering and Networking
Department : Electronics and Communication
Year : 2021
Abstract : Morphological synthesis is one of the main components of Machine Translation (MT) frameworks, especially when any one or both of the source and target languages are morphologically rich. Morphological synthesis is the process of combining two words or two morphemes according to the Sandhi rules of the morphologically rich language. Malayalam and Tamil are two languages in India which are morphologically abundant as well as agglutinative. Morphological synthesis of a word in these two languages is challenging basically because of the following reasons: (1) Abundance in morphology; (2) Complex Sandhi rules; (3) The possibilty in Malayalam to form words by combining words that belong to different syntactic categories (for example, noun and verb); and (4) The construction of a sentence by combining multiple words. We formulated the task of the morphological generation of nouns and verbs of Malayalam and Tamil as a character-to-character sequence tagging problem. In this article, we used deep learning architectures like Recurrent Neural Network (RNN), Long Short-Term Memory Networks (LSTM), Gated Recurrent Unit (GRU), and their stacked and bidirectional versions for the implementation of morphological synthesis at the character level. In addition to that, we investigated the performance of the combination of the aforementioned deep learning architectures and the Conditional Random Field (CRF) in the morphological synthesis of nouns and verbs in Malayalam and Tamil. We observed that the addition of CRF to the Bidirectional LSTM/GRU architecture achieved more than 99% accuracy in the morphological synthesis of Malayalam and Tamil nouns and verbs.
Cite this Research Publication : Premjith, B., Soman, K.P., Deep Learning Approach for the Morphological Synthesis in Malayalam and Tamil at the Character Level, (2021) ACM Transactions on Asian and Low-Resource Language Information Processing, 20 (6), art. no. 94, . DOI: 10.1145/3457976