Publication Type : Journal Article
Publisher : International journal of Applied Engineering Research (IJAER)
Source : International journal of Applied Engineering Research (IJAER) , 10(3):7325-7334 (2015)
Campus : Chennai
School : School of Engineering
Department : Computer Science
Year : 2015
Abstract : Electronic data acts as a major role in many applications such as banking, catalog maintenance, and Library management etc. While collecting such large amount of data from many, distributed and different sources causes data quality problems such as duplicates. Those data’s are normally in the form of relational or in hierarchical manner. An XML is one of the hierarchical ways of representing the data. There are not too many solutions available for duplicate detection in hierarchical data. A recent approach for XML duplicate detection, called XMLDup uses a Bayesian network to determine the probability of two XML elements being duplicate. It consider both the similarity of attribute content and the relative importance of the descendant elements with respect to the overall similarity score calculated using the Edit base distance function. Even though Edit base distances are a well-known family of tree distances function, however it has several drawbacks in its mapping rules. This paper proposes a new similarity function for XML data comparison, namely Extended Sub Tree (EST), a new Sub Tree mapping is introduced in order to identify duplicates between two different XML data.
Cite this Research Publication : G Bharathi Mohan, T. Ravi, ”Duplicate Detection in XML Data using Extended Sub-Tree Similarity function” in International journal of Applied Engineering Research (IJAER) 10(3):7325-7334 (2015)