Publication Type : Conference Paper
Publisher : IEEE
Source : Proceedings of IEEE InC4 2023 - 2023 IEEE International Conference on Contemporary Computing and Communications, 2023. DOI: 10.1109/InC457730.2023.10263218
Url : https://ieeexplore.ieee.org/document/10263218
Campus : Bengaluru
Department : Electrical and Electronics
Year : 2023
Abstract : The growing popularity of social media platforms and microblogging websites has led to an increase in the expression of views and opinions. However, conversations and debates on these platforms often lead to the use of toxic comments, which consists of insulting and hateful remarks. To address this issue, it is important for social media systems to be able to recognize harmful comments. With the rising incidence of cyberbullying, it is crucial to study the classification of toxic comments using various algorithms. This study compares the effectiveness of different word and sentence embedding methods, including TF-IDF, InferSent, Bert, and T5 for toxic comments classification. A comparative study is also conducted on the impact of using SMOTE to balance the highly imbalanced dataset. The results of these models are compared and analysed. It is observed that T5 embedding with Random Forest Classifier works best at 0.91 F1-Score.
Cite this Research Publication : Kumar, A.A., Pati, P.B., Deepa, K., Sangeetha, S.T., "Toxic Comment Classification Using S-BERT Vectorization and Random Forest Algorithm", Proceedings of IEEE InC4 2023 - 2023 IEEE International Conference on Contemporary Computing and Communications, 2023. DOI: 10.1109/InC457730.2023.10263218