Publication Type : Journal Article
Source : (2015)
Keywords : Information Retrieval, Language identification, Mixed Script, Short Message., Support vector machine (SVM)
Campus : Coimbatore
School : School of Engineering
Center : Computational Engineering and Networking
Department : Electronics and Communication
Year : 2015
Abstract : The progression of social media contents, similar like Twitter and Facebook messages and blog post, has created, many new opportunities for language technology. The user generated contents such as tweets and blogs in most of the languages are written using Roman script due to distinct social culture and technology. Some of them using own language script and mixed script. The primary challenges in process the short message is identifying languages. Therefore, the language identification is not restricted to a language but also to multiple languages. The task is to label the words with the following categories L1, L2, Named Entities, Mixed, Punctuation and Others This paper presents the AmritaCen_NLP team participation in FIRE2015-Shared Task on Mixed Script Information Retrieval Subtask 1: Query Word Labeling on language identification of each word in text, Named Entities, Mixed, Punctuation and Others which uses sequence level query labelling with Support Vector Machine.
Cite this Research Publication : R. Venkatesh Kumar, Dr. M. Anand Kumar, and Dr. Soman K. P., “AmritaCEN_NLP@ FIRE 2015 Language Identification for Indian Languages in Social Media Text”, 2015.