Publication Type : Journal Article
Publisher : Elsevier
Source : Data & Knowledge Engineering 121, 109-129, 2019. DOI: https://doi.org/10.1016/j.datak.2019.05.003
Url : https://www.sciencedirect.com/science/article/abs/pii/S0169023X17305633
Campus : Amritapuri
School : Department of Computer Science and Engineering
Center : AI (Artificial Intelligence) and Distributed Systems
Department : Computer Science
Year : 2019
Abstract : High dimensional data analysis within relational database management systems (RDBMS) is challenging because of inadequate support from SQL. Currently, subspace clustering of high dimensional data is implemented either outside DBMS using wrapper code or inside DBMS using SQL User Defined Functions/Aggregates(UDFs/UDAs). However, both these approaches have potential disadvantages from performance, resource usage, and security perspective for voluminous and frequently updated data. Hence, we propose an efficient querying system, named SubspaceDB, that implements subspace clustering directly within an RDBMS. SubspaceDB provides a novel set of query operators, each with an optimization objective, to facilitate interactive analysis for subspace clustering. The query operators focus on retrieving optimal answers to four key query types : (a) Medoid queries, (b) Neighbourhood queries, (c) Partial similarity queries, and (d) Prominence queries, that aid the formation of subspace clusters. Experimental studies on real and synthetic databases of size 15 M tuples and 104 attributes show that our proposed approach SubspaceDB can be over 10 times faster as compared to a conventional wrapper-based or SQL UDF approach. The proposed approach is also efficient in retrieving at least 50% data with performance improvement of at least 25%.
Cite this Research Publication : S Harikumar, MR Kaimal, "SubspaceDB: In-database subspace clustering for analytical query processing," Data & Knowledge Engineering 121, 109-129, 2019. DOI: https://doi.org/10.1016/j.datak.2019.05.003