Back close

A deep learning based Part-of-Speech (POS) tagger for Sanskrit language by embedding character level features

Publication Type : Conference Paper

Publisher : FIRE'18 .

Source : FIRE'18 (2018)

Url : https://www.semanticscholar.org/paper/A-deep-learning-based-Part-of-Speech-(POS)-tagger-Premjith-SomanK./fa82af702a552044b0410ba1bfa2c150809fa4a6

Campus : Coimbatore

School : School of Engineering

Center : Computational Engineering and Networking

Department : Electronics and Communication, CISAI

Year : 2018

Abstract : Part-of-Speech (POS) tagging is an important task in Natural Language Processing and numerous taggers have been developed for POS tagging in several languages. In Sanskrit also, one of the oldest languages in the world, many POS taggers were developed. However, less attention was given to the machine learning based POS tagging. In this paper, various deep learning algorithms are used for implementing a POS tagger for Sanskrit. This problem is framed as a sequence labeling problem at the character level. Therefore, a word to be POS tagged is considered as a sequence of characters and the sequential relationship among the characters in a word is captured with the deep learning algorithms such as Recurrent Neural Network (RNN), Long Short-Term Memory (LSTM) networks, Gate Recurrent Unit (GRU) and their bidirectional versions. The character level formulation of the problem reduces the memory requirement compared to the word level implementations and also increases the accuracy of labeling. The performance of the labeling task was analyzed with the different combinations of hyper-parameters. We obtained the accuracy score of 97.86% with Bidirectional GRU. The character level implementations of both uni and bidirectional forms of RNN, LSTM and GRU outperformed all world level implementations in terms of accuracy, number of trainable parameters and the storage requirement.

Cite this Research Publication : B. Premjith, Dr. Soman K. P., and Prabaharan Poornachandran, “A deep learning based Part-of-Speech (POS) tagger for Sanskrit language by embedding character level features”, in FIRE'18, 2018.

Admissions Apply Now