Publication Type : Conference Paper
Publisher : 2019 9th International Conference on Advances in Computing and Communication (ICACC)
Source : Proceedings of the 2019 9th International Conference on Advances in Computing and Communication, ICACC 2019this link is disabled, 2019, pp. 79–85, 8986223
Keywords : news scraping,information extraction,web crawling,named entity recognizer,latent dirichlet allocation,text similarity
Campus : Amritapuri
School : Department of Computer Science and Engineering
Center : AI (Artificial Intelligence) and Distributed Systems
Department : Computer Science
Year : 2019
Abstract : The importance of news media is unquestionable. These news media contain a great amount of information hidden along the lines of the articles. For analytics, extracting information and organizing the information to draw out conclusion is very important. The objective of our work is to focus on designing a tool to extract details from English news articles and present it to the user, in an organized manner. A predefined set of websites are crawled and the details are stored. The details extracted by the tool are named entities such as location, person, and organization mentioned in the news, news summary and important keywords pertaining to each news article. We also equip the tool with an efficient search engine, along with database indexing for faster information retrieval.