Back close

Toxic Comment Classification Using S-BERT Vectorization and Random Forest Algorithm

Publication Type : Conference Paper

Publisher : IEEE

Source : Proceedings of IEEE InC4 2023 - 2023 IEEE International Conference on Contemporary Computing and Communications, 2023. DOI: 10.1109/InC457730.2023.10263218

Url : https://ieeexplore.ieee.org/document/10263218

Campus : Bengaluru

Department : Electrical and Electronics

Year : 2023

Abstract : The growing popularity of social media platforms and microblogging websites has led to an increase in the expression of views and opinions. However, conversations and debates on these platforms often lead to the use of toxic comments, which consists of insulting and hateful remarks. To address this issue, it is important for social media systems to be able to recognize harmful comments. With the rising incidence of cyberbullying, it is crucial to study the classification of toxic comments using various algorithms. This study compares the effectiveness of different word and sentence embedding methods, including TF-IDF, InferSent, Bert, and T5 for toxic comments classification. A comparative study is also conducted on the impact of using SMOTE to balance the highly imbalanced dataset. The results of these models are compared and analysed. It is observed that T5 embedding with Random Forest Classifier works best at 0.91 F1-Score.

Cite this Research Publication : Kumar, A.A., Pati, P.B., Deepa, K., Sangeetha, S.T., "Toxic Comment Classification Using S-BERT Vectorization and Random Forest Algorithm", Proceedings of IEEE InC4 2023 - 2023 IEEE International Conference on Contemporary Computing and Communications, 2023. DOI: 10.1109/InC457730.2023.10263218

Admissions Apply Now