Back close

Co-occurrence based word representation for extracting named entities in Tamil tweets

Publication Type : Conference Paper

Publisher : Journal of Intelligent and Fuzzy Systems, IOS Press

Source : Journal of Intelligent and Fuzzy Systems, IOS Press, Volume 34, Number 3, p.1435-1442 (2018)

Url : https://www.scopus.com/inward/record.uri?eid=2-s2.0-85044716889&doi=10.3233%2fJIFS-169439&partnerID=40&md5=368d345846b8620d81dfa2531664fab7

Keywords : Character recognition, Co-occurrence informations, Computational linguistics, Feature extraction methods, glove embedding, N-grams, Natural language processing systems, Social networking (online), structured skip gram, Support vector machines, Unstructured texts, Word representations, Word2vec

Campus : Coimbatore

School : School of Engineering

Center : Computational Engineering and Networking

Department : Electronics and Communication

Year : 2018

Abstract : Social media is considered to be a vibrant area where millions of individuals interact and share their views. Processing social media text in Indian languages is a challenging task, as it is a well-known fact that Indian languages are morphologically rich in structure. On transferring such an unstructured text into a consistent format, the data is exposed to feature extraction method. In the huge corpora, information units i.e. entities holds the basic idea of the content. The main aim of the system is to recognise and extract the named entities in the social media twitter text. The proposed system relies on the proficient co-occurrence based word embedding models to extract the features for the words in the dataset. The proposed work makes use of text data from the Twitter resource in the Tamil language. In order to enhance the performance of the system, tri-gram features are extracted from the word embedding vectors. Hence, systems are trained using N-gram embedding features and named entity tags. Implementation of the system is using machine learning classifier, Support Vector Machine (SVM). On comparing the performance of the proposed systems, it can be seen that glove embedding shows better results with the accuracy of 96.93%, whereas the accuracy of word2vec embedding is 84.53%. The improvement in the performance of the system based on glove embedding with regard to the accuracy may be due to the imperative role of the co-occurrence information of glove embedding in recognising the entities. © 2018 - IOS Press and the authors. All rights reserved

Cite this Research Publication : R. G. Devi, M. Kumar, A., and Dr. Soman K. P., “Co-occurrence based word representation for extracting named entities in Tamil tweets”, in Journal of Intelligent and Fuzzy Systems, 2018, vol. 34, pp. 1435-1442.

Admissions Apply Now