Publication Type : Conference Paper
Publisher : IEEE
Source : Recent Trends in Information, Telecommunication and Computing (ITC), 2010 International Conference on, IEEE, Kochi, Kerala, p.339-341 (2010)
Campus : Coimbatore
School : School of Engineering
Center : Computational Engineering and Networking
Department : Electronics and Communication
Year : 2010
Abstract : This paper presents the building of part-of-speech Tagger for Malayalam Language using Support Vector Machine (SVM). POS tagger plays an important role in Natural language applications like speech recognition, natural language parsing, information retrieval and information extraction. This supervised machine learning POS tagging approach requires a large amount of annotated training corpus to tag properly. At initial stage of POS-tagging for Malayalam, the model is trained with a very limited resource of annotated corpus. We tried to maximize the performance with this a substantial amount of annotated corpus. The objective of this project was to identify the ambiguities in Malayalam lexical items and develop an efficient and accurate POS Tagger. We have developed our own tagset for training and testing the POS-tagger generators. The present tagset consists of 29 tags. A corpus size of one hundred and eighty thousand words was used for training and testing the accuracy of the tagger generators. We found that the result obtained was more efficient and accurate compared with earlier methods for Malayalam POS tagging.
Cite this Research Publication : P. J. Antony, Mohan, S. P., and Dr. Soman K. P., “SVM Based Part of Speech Tagger for Malayalam”, in Recent Trends in Information, Telecommunication and Computing (ITC), 2010 International Conference on, Kochi, Kerala, 2010, pp. 339-341.