Publication Type : Journal Article
Publisher : International Journal of Computer Technology and Applications (IJCTA)
Source : International Journal of Computer Technology and Applications (IJCTA), Volume 5, Issue 5 (2014)
Url : http://www.ijcta.com/documents/volumes/vol5issue5/ijcta2014050517.pdf
Campus : Bengaluru
School : Department of Computer Science and Engineering, School of Engineering
Department : Computer Science
Year : 2014
Abstract : Analysing web log files has become an important task for E-Commerce companies to predict their customer behaviour and to improve their business. Each click inan E-commerce web page creates 100 bytes of data. Large E-Commerce websites like flipkart.com, amazon.in and ebay.in are visited millions of customers simultaneously. As a result, these customers generate petabytes of data in their web log files. As the web log file size is huge we require parallel processing and reliable data storage systemfor processing the web log files. Both the requirements are provided by Hadoop framework. Hadoop provides Hadoop Distributed File System (HDFS) and MapReduce rogramming model for processing huge dataset efficiently and effectively. In this paper, NASA web log file is analysed and the total number of hits received by each web page in a website, the total number of hits received by a web site in each hour using Hadoop framework is calculated and it is shown that Hadoop framework takes less response time to produce accurate results. Keywords - Hadoop, MapReduce, Log Files, Parallel Processing, Hadoop Distributed File System, ECommerce
Cite this Research Publication : S. Saravanan and B. Uma Maheswari, “Analyzing Large Web Log Files in a Hadoop Distributed Cluster Environment”, International Journal of Computer Technology and Applications (IJCTA), vol. 5, no. 5, 2014.