Publication Type : Conference Paper
Publisher : IEEE
Source : In 2013 Fifth International Conference on Advanced Computing (ICoAC)
Url : https://ieeexplore.ieee.org/document/6921933
Campus : Chennai
School : School of Engineering
Department : Computer Science and Engineering
Year : 2013
Abstract : Evolution of Cloud computing technology over the Internet and drastic increase in data size and intensity (Big Data) persuade Map Reduce and distributed file systems like HDFS (Hadoop Distributed File System) as the paradigm of choice for distributed data mining applications. With size and complexity of data growing every day, distributed data mining algorithms has to be designed to handle Big Data in compatible with the latest technology available on distributed computing. Earlier research activities in data mining comprises, focus on increasing the performance for single task computing algorithms rather than distributed computing which would provide more fast and scalable environment for processing large datasets. Existing algorithms in the field of distributed frequent pattern data mining includes, TPFP-tree, BTP tree, and CARM. But these algorithms suffer from unbalanced workload management among its clusters. In this paper, a novel algorithm, named Association rule mining based on Hadoop (ARMH) has been proposed to utilize the clusters effectively and mining frequent pattern from large databases. Hadoop distributed framework helps in managing the workload among the clusters. The ARMH was implemented in hadoop using Map Reduce programming paradigm.
Cite this Research Publication :
Natarajan, S. and Sehar, S., 2013, December. A novel algorithm for distributed data mining in HDFS. In 2013 Fifth International Conference on Advanced Computing (ICoAC) (pp. 93-99). IEEE.