Back close

An effective way of word-level language identification for code-mixed facebook comments using word-embedding via character-embedding

Publication Type : Conference Paper

Publisher : 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI), IEEE

Source : 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI), IEEE, Udupi, India (2017)

Url : https://ieeexplore.ieee.org/abstract/document/8126062

Campus : Coimbatore

School : School of Engineering

Center : Computational Engineering and Networking, Electronics Communication and Instrumentation Forum (ECIF)

Department : Communication, Electronics and Communication

Year : 2017

Abstract : Individuals utilize online networking sites like Facebook and Twitter to express their interests, opinions or reviews. The users used English language as their medium for communication in earlier days. Despite the fact that content can be written in Unicode characters now, people find it easier to communicate by mixing two or more languages together or lean toward writing their native language in Roman script. These types of data are called code-mixed text. While processing such social-media data, recognizing the language of the text is an important task. In this work, we have developed a system for word-level language identification on code-mixed social media text. The work is accomplished for Tamil-English and Malayalam-English code-mixed Facebook comments. The methodology used for the system is a novel approach which is implemented based on features obtained from character-based embedding technique with the context information and uses a machine learning based classifier, Support Vector Machine for training and testing. An accuracy of 93% was obtained for Malayalam-English and 95% for Tamil-English code-mixed text.

Cite this Research Publication : P. V. Veena, Kumar, M. A., and Dr. Soman K. P., “An effective way of word-level language identification for code-mixed facebook comments using word-embedding via character-embedding”, in 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI), Udupi, India, 2017.

Admissions Apply Now