Publication Type : Conference Paper
Publisher : 9th International Conference on Advanced Computing
Source : 9th International Conference on Advanced Computing (ICoAC 2017), MIT, Chennai (2017)
Campus : Bengaluru
School : Department of Computer Science and Engineering, School of Engineering
Department : Computer Science
Year : 2017
Abstract : Language identification is used to categorize the language of a given document. Language identification categorizes the contents and can have a better search results for a multilingual document. In this work, we classify each line of text to a particular language and focused on short phrases of length 2–6 words for 15 Indian languages. It detects that a given document is in multilingual and identifies the appropriate Indian languages. The approach used is the combination of n-gram technique and a list of short distinctive words. The n-gram model applied is language independent whereas short word method uses less computation. The results show the effectiveness of our approach over the synthetic data.
Cite this Research Publication : S. Bhaskaran, Paul, G., Dr. Deepa Gupta, and Amudha, J., “Langtool: Identification of Indian Language for short Text”, in 9th International Conference on Advanced Computing (ICoAC 2017), MIT, Chennai , 2017.