Back close

Deep Learning Based Part-of-Speech Tagging for Malayalam Twitter Data (Special Issue: Deep Learning Techniques for Natural Language Processing)

Publication Type : Journal Article

Publisher : Journal of Intelligent Systems

Source : Journal of Intelligent Systems, Vol 28, pp 423-435, 2018

Url : https://www.scopus.com/inward/record.uri?eid=2-s2.0-85048441316&doi=10.1515%2fjisys-2017-0520&partnerID=40&md5=13588ecb67e041589d02b1e6ad308337

Keywords : audio signal processing, bidirectional LSTM, Brain, Computational linguistics, Deep learning, gated recurrent unit, Learning methods, Learning techniques, Long short-term memory, Natural language processing systems, Part of speech tagging, Recurrent neural network (RNN), Recurrent neural networks, Sequential model, Social media datum, Social networking (online), Syntactics

Campus : Coimbatore

School : School of Engineering

Center : Computational Engineering and Networking

Department : Center for Computational Engineering and Networking (CEN), Electronics and Communication

Year : 2018

Abstract : The paper addresses the problem of part-of-speech (POS) tagging for Malayalam tweets. The conversational style of posts/tweets/text in social media data poses a challenge in using general POS tagset for tagging the text. For the current work, a tagset was designed that contains 17 coarse tags and 9915 tweets were tagged manually for experiment and evaluation. The tagged data were evaluated using sequential deep learning methods like recurrent neural network (RNN), gated recurrent units (GRU), long short-term memory (LSTM), and bidirectional LSTM (BLSTM). The training of the model was performed on the tagged tweets, at word level and character level. The experiments were evaluated using measures like precision, recall, f1-measure, and accuracy. During the experiment, it was found that the GRU-based deep learning sequential model at word level gave the highest f1-measure of 0.9254; at character-level, the BLSTM-based deep learning sequential model gave the highest f1-measure of 0.8739. To choose the suitable number of hidden states, we varied it as 4, 16, 32, and 64, and performed training for each. It was observed that the increase in hidden states improved the tagger model. This is an initial work to perform Malayalam Twitter data POS tagging using deep learning sequential models. © 2018 Walter de Gruyter GmbH, Berlin/Boston 2018.

Cite this Research Publication : Sachin Kumar S, M Anand Kumar, K P Soman, Deep Learning Based Part-of-Speech Tagging for Malayalam Twitter Data (Special Issue: Deep Learning Techniques for Natural Language Processing), Journal of Intelligent Systems, Vol 28, pp 423-435, 2018 (Scopus)

Admissions Apply Now