Publication Type : Journal Article
Publisher : Springer
Source : Journal of Big Data 9, 45 (2022). https://doi.org/10.1186/s40537-022-00594-3.
Url : https://journalofbigdata.springeropen.com/articles/10.1186/s40537-022-00594-3
Campus : Amritapuri
School : School of Computing
Center : Computational Linguistics and Indic Studies
Year : 2022
Abstract : Zigzag conversational patterns of contents in social media are often perceived as noisy or informal text. Unrestricted usage of vocabulary in social media communications complicates the processing of code-mixed text. This paper accentuates two major aspects of code mixed text: Offensive Language Identification and Sentiment Analysis for Malayalam–English code-mixed data set. The proffered framework addresses 3 key points apropos these tasks—dependencies among features created by embedding methods (Word2Vec and FastText), comparative analysis of deep learning algorithms (uni-/bi-directional models, hybrid models, and transformer approaches), relevance of selective translation and transliteration and hyper-parameter optimization—which ensued in F1-Scores (model’s accuracy) of 0.76 for Forum for Information Retrieval Evaluation (FIRE) 2020 and 0.99 for European Chapter of the Association for Computational Linguistics (EACL) 2021 data sets. A detailed error analysis was also done to give meaningful insights. The submitted strategy turned in the best results among the benchmarked models dealing with Malayalam–English code-mixed messages and it serves as an important step towards societal good.
Cite this Research Publication : Thara, S., Poornachandran, P. "Social media text analytics of Malayalam–English code-mixed using deep learning." Journal of Big Data 9, 45 (2022). https://doi.org/10.1186/s40537-022-00594-3.