Back close

MapReduce model for k-medoid clustering

Publication Type : Conference Paper

Publisher : Proceedings of the 2016 International Conference on Data Science and Engineering,

Source : Proceedings of the 2016 International Conference on Data Science and Engineering, ICDSE 2016, 1-7

Url : https://www.scopus.com/inward/record.uri?eid=2-s2.0-85015152766&doi=10.1109%2fICDSE.2016.7823940&partnerID=40&md5=f7708f36a2d16f004bccf0d669df721c

ISBN : 9781509012800

Keywords : Distributed and parallel computing, Distributed computer systems, Experimental analysis, K medoid clustering, Map-reduce, Mapreduce frameworks, Multiprocessing systems, Overall execution, Scalable clustering, Transmission costs

Campus : Amritapuri

School : School of Engineering, Department of Computer Science and Engineering

Center : AI (Artificial Intelligence) and Distributed Systems

Department : Computer Science

Year : 2016

Abstract : Distributed and Parallel computing are best alternatives for scalable clustering of huge amount of data with moderate to high dimensions, together with improved speed up. In this paper we address the problem of k-medoid clustering using MapReduce framework for distributed computing on commodity machines to evaluate its efficacy. There are mainly two issues to be tackled. The first one is, how to distribute the data for efficient clustering and the second one is, how to minimize the I/O and network cost among the machines. So, the main contributions of this paper are : (a)A map reduce methodology for distributed k-medoid clustering; (b) Reduction in the overall execution time and the overhead of data movement from one site to another leading to sub linear scaleup and speedup. This approach proves to be efficient, as the local clustering can be carried out independently from each other. Experimental analysis on millions of data using just 10 cores in parallel shows the clustering of data of size 1M × 17 requires only 4 minutes. So, such low transmission cost and low bandwidth requirement leads to improved speedup and scaleup of the distributed data. © 2016 IEEE.

Cite this Research Publication : Sandhya Harikumar and Thaha, S. S., “MapReduce model for k-medoid clustering”, in Proceedings of the 2016 International Conference on Data Science and Engineering, ICDSE 2016, 1-7

Admissions Apply Now