Back close

SubspaceDB: In-database subspace clustering for analytical query processing

Publication Type : Journal Article

Publisher : Data and Knowledge Engineeringthis link is disabled, 2019, 121, pp. 109–129

Authors : Sandhya Harikumar, Kaimal, M.R.

Source : Data and Knowledge Engineeringthis link is disabled, 2019, 121, pp. 109–129

Campus : Amritapuri

School : Department of Computer Science and Engineering

Department : Computer Science

Year : 2019

Abstract : Abstract High dimensional data analysis within relational database management systems (RDBMS) is challenging because of inadequate support from SQL. Currently, subspace clustering of high dimensional data is implemented either outside DBMS using wrapper code or inside DBMS using SQL User Defined Functions/Aggregates(UDFs/UDAs). However, both these approaches have potential disadvantages from performance, resource usage, and security perspective for voluminous and frequently updated data. Hence, we propose an efficient querying system, named SubspaceDB, that implements subspace clustering directly within an RDBMS. SubspaceDB provides a novel set of query operators, each with an optimization objective, to facilitate interactive analysis for subspace clustering. The query operators focus on retrieving optimal answers to four key query types : (a) Medoid queries, (b) Neighbourhood queries, (c) Partial similarity queries, and (d) Prominence queries, that aid the formation of subspace clusters. Experimental studies on real and synthetic databases of size 15 M tuples and 104 attributes show that our proposed approach SubspaceDB can be over 10 times faster as compared to a conventional wrapper-based or SQL UDF approach. The proposed approach is also efficient in retrieving at least 50% data with performance improvement of at least 25%.

Admissions Apply Now