Indian Language Identification for Short Text

Publication Type : Conference Proceedings

Publisher : Advances in Intelligent Systems and Computing. Springer,

Source : Advances in Intelligent Systems and Computing, Springer, p.47-58 (2021)

Url : https://link.springer.com/chapter/10.1007/978-981-15-1275-9_5

Keywords : accuracy, Language identification, n-gram, Trigrams

Campus : Bengaluru

School : Department of Computer Science and Engineering, School of Engineering

Department : Computer Science

Year : 2021

Abstract : Language Identification is used to categorize the language of a given document. Language Identification categorizes the contents and can have a better search results for a multilingual document. In this work, we classify each line of text to a particular language and focused on short phrases of length 2 to 6 words for 15 Indian languages. It detects that a given document is in multilingual and identifies the appropriate Indian languages. The approach used is the combination of n-gram technique and a list of short distinctive words. The n-gram model applied is language independent whereas short word method uses less computation. The results show the effectiveness of our approach over the synthetic data.

Cite this Research Publication : S. Bhaskaran, Geetika Paul, Dr. Deepa Gupta, and Amudha J., “Indian Language Identification for Short Text”, Advances in Intelligent Systems and Computing. Springer, pp. 47-58, 2021.

About Amrita Vishwa Vidyapeetham

Rankings

Accreditation

Governance

Chancellor

Leadership

Provost

Press Media

Newsletters

Amritapuri
Campus

Amaravati
Campus

Bengaluru
Campus

Chennai
Campus

Coimbatore
Campus

Faridabad
Campus

Kochi
Campus

Mysuru
Campus

Nagercoil
Campus

Research

Centers

Patents

Publications