A deep learning based Part-of-Speech (POS) tagger for Sanskrit language by embedding character level features

Publication Type : Conference Paper

Publisher : FIRE'18 .

Source : FIRE'18 (2018)

Url : https://www.semanticscholar.org/paper/A-deep-learning-based-Part-of-Speech-(POS)-tagger-Premjith-SomanK./fa82af702a552044b0410ba1bfa2c150809fa4a6

Campus : Coimbatore

School : School of Engineering

Center : Computational Engineering and Networking

Department : CISAI, Electronics and Communication

Year : 2018

Abstract : Part-of-Speech (POS) tagging is an important task in Natural Language Processing and numerous taggers have been developed for POS tagging in several languages. In Sanskrit also, one of the oldest languages in the world, many POS taggers were developed. However, less attention was given to the machine learning based POS tagging. In this paper, various deep learning algorithms are used for implementing a POS tagger for Sanskrit. This problem is framed as a sequence labeling problem at the character level. Therefore, a word to be POS tagged is considered as a sequence of characters and the sequential relationship among the characters in a word is captured with the deep learning algorithms such as Recurrent Neural Network (RNN), Long Short-Term Memory (LSTM) networks, Gate Recurrent Unit (GRU) and their bidirectional versions. The character level formulation of the problem reduces the memory requirement compared to the word level implementations and also increases the accuracy of labeling. The performance of the labeling task was analyzed with the different combinations of hyper-parameters. We obtained the accuracy score of 97.86% with Bidirectional GRU. The character level implementations of both uni and bidirectional forms of RNN, LSTM and GRU outperformed all world level implementations in terms of accuracy, number of trainable parameters and the storage requirement.

Cite this Research Publication : Premjith, B., Soman, K.P., Poornachandran, P., A deep learning based Part-of-Speech (POS) tagger for Sanskrit language by embedding character level features, (2018) ACM International Conference Proceeding Series, pp. 56-60., DOI: 10.1145/3293339.3293352

About Amrita Vishwa Vidyapeetham

Rankings

Accreditation

Governance

Chancellor

Leadership

Press Media

Newsletters

Amritapuri
Campus

Amaravati
Campus

Bengaluru
Campus

Chennai
Campus

Coimbatore
Campus

Faridabad
Campus

Kochi
Campus

Mysuru
Campus

Nagercoil
Campus

Research

Centers

Patents

Publication