Publication Type : Conference Paper
Publisher : International Conference on Data Science and Engineering, ICDSE 2014
Source : International Conference on Data Science and Engineering, ICDSE 2014, Institute of Electrical and Electronics Engineers Inc., p.105-111 (2014)
ISBN : 9781479968701
Keywords : Data integration, Heterogeneous database, High-dimensional, integration, Iterative methods, Linear regression, Linear subspace, Metadata, Multiple linear regressions, Regression analysis, Relational schemas, Schema information, Semantic integration, Semantics, Singular value decomposition, Technology advances
Campus : Amritapuri
School : Department of Computer Science and Engineering, School of Computing
Center : AI (Artificial Intelligence) and Distributed Systems
Department : Computer Science
Verified : Yes
Year : 2014
Abstract : The challenge of semantic integration of heterogeneous databases is one of the critical areas of interest due to scalability of data and the need to share the existing data as the technology advances. The schema level heterogeneity of the relations is the major issue for such integration. Though various approaches of schema analysis, transformation and integration have been explored, sometimes those become too general to solve the problem especially when the data is very high-dimensional and the schema information is unavailable or inadequate. In this paper, a method to integrate heterogeneous relational schema at instance-level is proposed, rather than the schema level. A global schema is designed consisting of the integration of most relevant attributes of different relational schema of a particular domain. In order to find the significant attributes, multiple linear regressions based on LI norm and Singular Value Decomposition(SVD) is applied on the data iteratively. This is a variant of L1-PCA, which is efficient, effective and meaningful method of linear subspace estimation. The most prominent instance - level similarity is found by finding the most significant attributes of each relational data source and then finding the similarity among those attributes using L1-norm. Thus an integrated schema is created that maps the relevant attributes of each local schema to a global schema. © 2014 IEEE.
Cite this Research Publication : Sandhya Harikumar, Reethima, R., and Dr. Kaimal, M. R., “Semantic integration of heterogeneous relational schemas using multiple L1 linear regression and SVD”, in International Conference on Data Science and Engineering, ICDSE 2014, 2014, pp. 105-111