Publication Type : Conference Paper
Publisher : CEUR Workshop Proceedings, CEUR-WS.
Source : CEUR Workshop Proceedings, CEUR-WS, Volume 1737, p.304-308 (2016)
Keywords : Artificial intelligence, Codes (symbols), Cross validation, Data mining, Entity extractions, extraction, Feature based modeling, Feature extraction, Fires, Indian languages, Information Retrieval, Learning algorithms, Learning systems, NAtural language processing, Natural language processing systems, Overall accuracies, Social media, Support vector machines, Word embedding
Campus : Coimbatore
School : School of Engineering
Center : Computational Engineering and Networking
Department : Electronics and Communication
Year : 2016
Abstract : Social media text holds information regarding various important aspects. Extraction of such information serves as the basis for the most preliminary task in Natural Language Processing called Entity extraction. The work is submitted as a part of Shared task on Code Mix Entity Extraction for Indian Languages(CMEE-IL) at Forum for Information Retrieval Evaluation (FIRE) 2016. Three different methodology is proposed in this paper for the task of entity extraction for code-mix data. Proposed systems include approaches based on the Embedding models and feature based model. Creation of trigram embedding and BIO tag formatting were done during feature extraction. Evaluation of the system is carried out using machine learning based classifier, SVM-Light. Overall accuracy through cross validation has proven that the proposed system is efficient in classifying unknown tokens too
Cite this Research Publication : R. G. Devi, Veena, P. V., Dr. M. Anand Kumar, and Dr. Soman K. P., “AMRITA-CEN@FIRE 2016: Code-mix entity extraction for Hindi-English and Tamil-English tweets”, in CEUR Workshop Proceedings, 2016, vol. 1737, pp. 304-308.