Publication Type : Journal Article
Publisher : Research Gate
Source : Advances in Modelling and Analysis B 60(3):525-538 DOI:10.18280/ama_b.600301
Campus : Amaravati
School : School of Engineering
Department : Computer Science and Engineering
Year : 2017
Abstract : The exponential development of perplexing, heterogeneous, dynamic, and unbounded information, produced by an group of fields including health, genomics, material science, climatology, and interpersonal organizations posture noteworthy difficulties in information preparing and desired speed-execution. The responsibility of collection and arrangement of objects such that items in a similar group are more comparable to each other than to those in different groups (clusters). Exploratory information technique is clustering, which arranges the information of dataset into a few groups. There are many grouping methods are accessible. Various types of calculations are best utilized for various types of information. K-means is mostly utilized for clustering analysis algorithm. Big data analytics includes numerous imperative data mining undertakings including clustering, which arranges the information into important clusters in view of the likeness or uniqueness among objects. Experiments are performed on a benchmark dataset to assess the attainability and effectiveness of our calculation. Immense measure i.e. Gigabytes, Terabytes) of information processing and analysis is done using the big data environment. For Cluster analysis technique, mainly the K-mean clustering algorithm is executed through the Hadoop and MapReduce to analyse high dimensional datasets. In big data analytics, the clustering is done when the unlabelled information is handled and used to group clusters of the information. Also when it is examined by the conventional k-means algorithm does not works well with the Hadoop framework and MapReduce programming in this manner it is mandatory to change the algorithm so as to improve the performance on the data analysing techniques. In this manner another clustering algorithm with improvement on conventional k-means clustering algorithm is proposed and executed. This approach initially upgrades the quality of the data by evacuating the anomaly focuses in datasets and afterward the bi-part technique is utilized to play out the grouping. The proposed algorithm for clustering method executed utilizing the Hadoop framework and MapReduce programming at long last the execution of the proposed algorithm of grouping approach is assessed and contrasted and the conventional k-means clustering technique. The acquired execution demonstrates the compelling outcomes and improved accuracy of group construction with the evacuation of the de-effectiveness. In this way the proposed work can be applied for big data environment with enhancing the execution of grouping.
Cite this Research Publication : A. Padma Priya, Thulasi Bikku "A novel algorithm for clustering and feature selection of high dimensional datasets", Advances in Modelling and Analysis B 60(3):525-538 DOI:10.18280/ama_b.600301