Publication Type : Journal Article
Publisher : Advances in Intelligent Systems and Computing
Source : Advances in Intelligent Systems and Computing, Springer Verlag, Volume 709, p.31-41 (2018)
Keywords : Character recognition, Conditional random field, Data mining, extraction, Feature extraction, Geoplanet, MCRF, Named entities, Named entity recognition, Natural language processing systems, Person ontologies, random processes, Sentence structures, Text, Text processing, Web services
Campus : Amritapuri, Bengaluru
School : Department of Computer Science and Engineering, School of Engineering
Center : Computational Linguistics and Indic Studies
Department : Computer Science, Mathematics
Year : 2018
Abstract : The Named Entity Recognition in documents is an active and challenging research topic in text mining. The major objective of our work is to extract a phrase from the sentence and classify this phrase to one of the predefined named entities. The proposed system works in two layers, in the first phase each and every word in the phrase is tagged using word feature extraction approaches. In the second phase the model recognizes named entities in the phrase level using Modified Conditional Random Field. This work identifies four classes of entities such as Person, Organization, Location and Other. Our algorithm first parses the text document and identifies the sentence structure. From this sentence structure concepts are extracted. In this work the feature extraction module make use of the yahoo Geoplanet Web service for identifying the location. We have created person ontology of all available Indian names to check whether a word is name or not. Inorder to check whether the word is organization or not we have used a database with company name indicators. Finally, our MCRF assign a label to the tagged phrase. © Springer Nature Singapore Pte Ltd. 2018
Cite this Research Publication : G. Veena, Dr. Deepa Gupta, Lakshmi S, and Jacob, J. T., “Named entity recognition in text documents using a modified conditional random field”, Advances in Intelligent Systems and Computing, vol. 709, pp. 31-41, 2018.