Publication Type : Journal Article
Source : Computación y Sistemas
Url : https://www.cys.cic.ipn.mx/ojs/index.php/CyS/article/viewFile/4358/3432
Campus : Amaravati
School : School of Computing
Year : 2022
Abstract : Machine translation is one of the most
powerful natural language processing applications for
preserving and upgrading low-resource language. Mizo
language is considered as low-resource since there
is limited availability of resources. Therefore, it is
a challenging task for English-Mizo language pair
translation. Moreover, Mizo is a tonal language, where
a word can express different meanings depending on
a variety of tones. There are four variations of tones,
namely high, low, rising, and falling. A tone marker is
used to represent each of the tones, which is added to
the vowels to indicate tone variation. Addressing tonal
words in machine translation for such a low-resource
pair is another challenging issue. In this paper, the
English-Mizo corpus is developed where parallel
sentences having tonal words are incorporated. The
different machine translation models are explored based
on statistical machine translation and neural machine
translation for the baseline systems. Furthermore, the
proposed approach attempts to augment the train data
by expanding parallel data having tonal words and
achieves state-of-the-art results for both forward and
backward translations encountering tonal words.
Cite this Research Publication : Vanlalmuansangi Khenglawt, Sahinur Rahman Laskar, Partha Pakray, Riyanka Manna, Ajoy Kumar Khan, Machine Translation for Low-Resource English-Mizo Pair Encountering Tonal Words, Computación y Sistemas, Vol. 26, No. 3, 2022, pp. 1377–1398 doi: 10.13053/CyS-26-3-4358. (ESCI, Scopus Indexed) [2022].