Publication Type : Conference Paper
Publisher : Proceedings of the 2016 International Conference on Data Science and Engineering,
Source : Proceedings of the 2016 International Conference on Data Science and Engineering, ICDSE 2016, 1-7
ISBN : 9781509012800
Keywords : Distributed and parallel computing, Distributed computer systems, Experimental analysis, K medoid clustering, Map-reduce, Mapreduce frameworks, Multiprocessing systems, Overall execution, Scalable clustering, Transmission costs
Campus : Amritapuri
School : School of Engineering, Department of Computer Science and Engineering
Center : AI (Artificial Intelligence) and Distributed Systems
Department : Computer Science
Year : 2016
Abstract : Distributed and Parallel computing are best alternatives for scalable clustering of huge amount of data with moderate to high dimensions, together with improved speed up. In this paper we address the problem of k-medoid clustering using MapReduce framework for distributed computing on commodity machines to evaluate its efficacy. There are mainly two issues to be tackled. The first one is, how to distribute the data for efficient clustering and the second one is, how to minimize the I/O and network cost among the machines. So, the main contributions of this paper are : (a)A map reduce methodology for distributed k-medoid clustering; (b) Reduction in the overall execution time and the overhead of data movement from one site to another leading to sub linear scaleup and speedup. This approach proves to be efficient, as the local clustering can be carried out independently from each other. Experimental analysis on millions of data using just 10 cores in parallel shows the clustering of data of size 1M × 17 requires only 4 minutes. So, such low transmission cost and low bandwidth requirement leads to improved speedup and scaleup of the distributed data. © 2016 IEEE.
Cite this Research Publication : Sandhya Harikumar and Thaha, S. S., “MapReduce model for k-medoid clustering”, in Proceedings of the 2016 International Conference on Data Science and Engineering, ICDSE 2016, 1-7