Back close

Indian Language Identification for Short Text

Publication Type : Conference Proceedings

Publisher : Advances in Intelligent Systems and Computing. Springer,

Source : Advances in Intelligent Systems and Computing, Springer, p.47-58 (2021)

Url : https://link.springer.com/chapter/10.1007/978-981-15-1275-9_5

Keywords : accuracy, Language identification, n-gram, Trigrams

Campus : Bengaluru

School : Department of Computer Science and Engineering, School of Engineering

Department : Computer Science

Year : 2021

Abstract : Language Identification is used to categorize the language of a given document. Language Identification categorizes the contents and can have a better search results for a multilingual document. In this work, we classify each line of text to a particular language and focused on short phrases of length 2 to 6 words for 15 Indian languages. It detects that a given document is in multilingual and identifies the appropriate Indian languages. The approach used is the combination of n-gram technique and a list of short distinctive words. The n-gram model applied is language independent whereas short word method uses less computation. The results show the effectiveness of our approach over the synthetic data.

Cite this Research Publication : S. Bhaskaran, Geetika Paul, Dr. Deepa Gupta, and Amudha J., “Indian Language Identification for Short Text”, Advances in Intelligent Systems and Computing. Springer, pp. 47-58, 2021.

Admissions Apply Now