Publication Type : Journal Article
Publisher : Expert Systems with Applications
Source : Expert Systems with Applications, Elsevier Ltd, Volume 73, p.11-26 (2017)
Keywords : Concept extraction, extraction, Genetic algorithms, Idea plagiarism, Intellectual property, Plagiarism detection, Semantic concept, Semantics, Similarity metrics, Structural levels, Summary obfuscation, Syntactic information, Syntactics, Text processing
Campus : Bengaluru
School : Department of Computer Science and Engineering, School of Engineering
Department : Computer Science
Year : 2017
Abstract : lagiarism is increasingly becoming a major issue in the academic and educational domains. Automated and effective plagiarism detection systems are direly required to curtail this information breach, especially in tackling idea plagiarism. The proposed approach is aimed to detect such plagiarism cases, where the idea of a third party is adopted and presented intelligently so that at the surface level, plagiarism cannot be unmasked. The reported work aims to explore syntax-semantic concept extractions with genetic algorithm in detecting cases of idea plagiarism. The work mainly focuses on idea plagiarism where the source ideas are plagiarized and represented in a summarized form. Plagiarism detection is employed at both the document and passage levels by exploiting the document concepts at various structural levels. Initially, the idea embedded within the given source document is captured using sentence level concept extraction with genetic algorithm. Document level detection is facilitated with word-level concepts where syntactic information is extracted and the non-plagiarized documents are pruned. A combined similarity metric that utilizes the semantic level concept extraction is then employed for passage level detection. The proposed approach is tested on PAN13-141 http://pan.webis.de/. plagiarism corpus for summary obfuscation data, which represents a challenging case of idea plagiarism. The performance of the current approach and its variations are evaluated both at the document and passage levels, using information retrieval and PAN plagiarism measures respectively. The results are also compared against six top ranked plagiarism detection systems submitted as a part of PAN13-14 competition. The results obtained are found to exhibit significant improvement over the compared systems and hence reflects the potency of the proposed syntax-semantic based concept extractions in detecting idea plagiarism. © 2016 Elsevier Ltd
Cite this Research Publication : V. V and Dr. Deepa Gupta, “Detection of idea plagiarism using syntax–Semantic concept extractions with genetic algorithm”, Expert Systems with Applications, vol. 73, pp. 11-26, 2017.