Publication Type : Conference Paper
Publisher : Proceedings of the 2016 2nd International Conference on Contemporary Computing and Informatics, IC3I 2016, Institute of Electrical and Electronics Engineers Inc.,
Source : Proceedings of the 2016 2nd International Conference on Contemporary Computing and Informatics, IC3I 2016, Institute of Electrical and Electronics Engineers Inc., p.661-666 (2016)
ISBN : 9781509052554
Keywords : Analytical queries, Classification algorithm, Data mining, Database systems, Feature engineerings, large scale systems, Logistic regressions, Mining projects, Query processing, Real data sets, Regression analysis, Statistical packages, User Defined Functions
Campus : Amritapuri
School : Department of Computer Science and Engineering, School of Engineering
Center : AI (Artificial Intelligence) and Distributed Systems
Department : Computer Science
Year : 2016
Abstract : The context of this paper is to come up with an analytical query model for data categorization within DBMS. DBMS being the asset for most of the organizations, classification can help in getting better insight and control over the data. Conventionally, classification algorithms like logistic regression, KNN, etc. are applied after exporting the data out of DBMS, using non DBMS tools like R, matrix packages, generic data mining programs or large scale systems like Hadoop and Spark. However, this leads to I/O overhead since the data within DBMS is updated quite frequently and usually cannot be accommodated in the main memory. This paper proposes an alternative strategy, based on SQL and UDFs, to integrate the logistic regression for data categorization as well as prediction query processing within DBMS. A comparison of SQL with user defined functions (UDFs) as well as with statistical packages like R is presented, by experimentation on real datasets. The empirical results show the viability and validity of this approach for predicting the class of a given query. © 2016 IEEE.
Cite this Research Publication : J. Isaac and Sandhya Harikumar, “Logistic regression within DBMS”, in Proceedings of the 2016 2nd International Conference on Contemporary Computing and Informatics, IC3I 2016, 2016, pp. 661-666