Back close

A comparative study of english to kannada baseline machine translation system with general and bible text corpus

Publication Type : Journal Article

Publisher : International Journal of Applied Engineering Research

Source : International Journal of Applied Engineering Research

Url : https://www.researchgate.net/publication/285602593_A_comparative_study_of_english_to_kannada_baseline_machine_translation_system_with_general_and_bible_text_corpus

Campus : Bengaluru

School : School of Arts and Sciences

Year : 2015

Abstract : In this paper we present the insights gained from a detailed study of Kannada-English Statistical machine translation system with reference to corpus creation. We propose approaches to create a quality corpus which can enhance class categories in translation modelling so that we can get improved machine translation. Statistical machine translation (SMT) is an approach to MT that is characterized by the use of machine learning methods. The accuracy of these systems depends crucially on the quantity, and domain pf the data. In SMT system data is pre-processed consistently. The agglutinative and morphologically rich Indian language require a huge amount of corpus creation because SMT treats morphological variants of a word as a separate to kenrather than a related token. So we need to create related words and sentences as unique entries in a corpus. Working with English-Kannada language pair with a small data set of 2500 sentences and a big openly available Bible corpus we show that the impact of token types and their frequency plays a major role in improving BLEU score of our Baseline MT System. We report comparative result of experiments conducted on these two corpus for English to Kannada Baseline MT System.

Cite this Research Publication : Km, Shivakumar & Namitha, B.N. & Nithya, R.. (2015). A comparative study of english to kannada baseline machine translation system with general and bible text corpus. 10. 30195-30202.

Admissions Apply Now