Back close

A scalable feature selection algorithm for large datasets-quick branch & bound iterative (QBB-I)

Publication Type : Journal Article

Thematic Areas : Learning-Technologies

Publisher : Smart Innovation, Systems and Technologies, Springer Science and Business Media Deutschland GmbH,

Source : Smart Innovation, Systems and Technologies, Springer Science and Business Media Deutschland GmbH, Volume 27, Number VOL 1, Kolkata, p.125-136 (2014)

Url : http://www.scopus.com/inward/record.url?eid=2-s2.0-84906700099&partnerID=40&md5=9f74f32052cf239a70f7afd40108d487

ISBN : 9783319073521

Keywords : Algorithms, Consistency measures, Evaluation measures, Feature extraction, Feature selection algorithm, Information science, Intelligent tutoring, Iterative methods, Optimal feature sets, Probabilistic search algorithms, QBB-Iterative, Scalable feature selection, Set theory, Statistical tests

Campus : Amritapuri

School : Department of Computer Science and Engineering, School of Engineering

Center : Amrita Center For Research in Analytics, AmritaCREATE

Department : Computer Science

Year : 2014

Abstract : Feature selection algorithms look to effectively and efficiently find an optimal subset of relevant features in the data. As the number of features and the data size increases, new methods of reducing the complexity while maintaining the goodness of the features selected are needed. We review popular feature selection algorithms such as the probabilistic search algorithm based Las Vegas Filter (LVF) and the complete search based Automatic Branch and Bound (ABB) that use the consistency measure. The hybrid Quick Branch and Bound (QBB) algorithm first runs LVF to find a smaller subset of valid features and then performs ABB with the reduced feature set. QBB is reasonably fast, robust and handles features which are interdependent, but does not work well with large data. In this paper, we propose an enhanced QBB algorithm called QBB Iterative (QBB-I).QBB-I partitions the dataset into two, and performs QBB on the first partition to find a possible feature subset. This feature subset is tested with the second partition using the consistency measure, and the inconsistent rows, if any, are added to the first partition and the process is repeated until we find the optimal feature set. Our tests with ASSISTments intelligent tutoring dataset using over 150,000 log data and other standard datasets show that QBB-I is significantly more efficient than QBB while selecting the same subset of features. © Springer International Publishing Switzerland 2014.

Cite this Research Publication : Prof. Prema Nedungadi and Remya, M. Sb, “A scalable feature selection algorithm for large datasets-quick branch & bound iterative (QBB-I)”, Smart Innovation, Systems and Technologies, vol. 27, pp. 125-136, 2014.

Admissions Apply Now