Back close

Named Entity Recognition Using Deep Learning and BERT for Tamil and Hindi Languages

Publication Type : Conference Paper

Publisher : Lecture Notes in Electrical Engineering

Source : International Conference on Advances in Data Science and Computing Technologies (2023), Lecture Notes in Electrical Engineering, 1056 LNEE, pp. 395-403. DOI: 10.1007/978-981-99-3656-4_40

Url : https://link.springer.com/chapter/10.1007/978-981-99-3656-4_40

Campus : Coimbatore

School : School of Artificial Intelligence, School of Artificial Intelligence - Coimbatore

Year : 2023

Abstract : Named Entity Recognition (NER) is a subclass of Information Extraction (IE) activity that captures text information and meaning of words from unstructured input. NER’s goal is to identify and categorize every word or symbol in a document into predefined categories. Information Extraction (IE) from Indian language text is gaining popularity among the Indian researchers. The extraction of meaningful and relevant information from structured data is crucial, and NER plays a key part in this process. The NER extraction is performed on the data in multilingual languages Hindi and Tamil. The work performed Word2vec, fastText, and BERT to create embeddings which are further used with deep learning and machine learning (ML) models. The model is proposed with three distinct neural architectures of the Long Short-Term Memory (LSTM), Bidirectional Long Short-Term Memory-Convolutional Neural Network (Bi-LSTM-CNN), Long Short-Term Memory-Convolutional Neural Network (LSTM-CNN) on Word2vec, and fastText embeddings with an overall F-measure of 85–88%. LSTM-CNN with fastText embeddings produced an F-measure of 88.3% for Hindi data. LSTM-CNN with Word2vec embeddings produced an F-measure of 88.66% for Tamil data. The predictions of the entities are evaluated using machine learning (ML) models Decision Tree, SVM, Logistic Regression, and Random Forest on the Bidirectional Encoder Representation from Transformers (BERT) embeddings. The F-measure achieved on these embeddings with SVM is 99.6%, 99.44% for Hindi and Tamil data, respectively.

Cite this Research Publication : Menon, S., Sanjanasri, J.P., Premjith, B., Soman, K.P., "Named Entity Recognition Using Deep Learning and BERT for Tamil and Hindi Languages," International Conference on Advances in Data Science and Computing Technologies (2023), Lecture Notes in Electrical Engineering, 1056 LNEE, pp. 395-403. DOI: 10.1007/978-981-99-3656-4_40

Admissions Apply Now