Back close

CEN@ Amrita: Information Retrieval on CodeMixed Hindi-English Tweets Using Vector Space Models

Publication Type : Journal Article

Publisher : Working notes of FIRE, p.7–10.

Source : Working notes of FIRE, p.7–10 (2016)

Url : http://ceur-ws.org/Vol-1737/T3-9.pdf

Keywords : CodeMixed social media, Information Retrieval, Mixed-Script, Semantics, Vector-space-models

Campus : Coimbatore

School : School of Engineering

Center : Computational Engineering and Networking

Department : Center for Computational Engineering and Networking (CEN), Computer Science

Year : 2016

Abstract : One of the major challenges nowadays is Information retrieval from social media platforms. Most of the information on these platforms is informal and noisy in nature. It makes the Information retrieval task more challenging. The task is even more difficult for twitter because of its character limitation per tweet. This limitation bounds the user to express himself in condensed set of words. In the context of India, scenario is little more complicated as users prefer to type in their mother tongue but lack of input tools force them to use Roman script with English embeddings. This combination of multiple languages written in the Roman script makes the Information retrieval task even harder. Query processing for such CodeMixed content is a difficult task because query can be in either of the language and it need to be matched with the documents written in any of the language. In this work, we dealt with this problem using Vector Space Models which gave significantly better results than the other participants. The Mean Average Precision (MAP) for our system.

Cite this Research Publication : S. Singh, Dr. M. Anand Kumar, and Dr. Soman K. P., “CEN@ Amrita: Information Retrieval on CodeMixed Hindi-English Tweets Using Vector Space Models”, Working notes of FIRE, pp. 7–10, 2016.

Admissions Apply Now