Publisher : International Journal of Applied Engineering Research
Campus : Mysuru
School : School of Arts and Sciences
Department : Computer Science
Year : 2015
Abstract : Internet is a pool of information, which contains billions of text documents which are stored in compressed format. In literature there are many text classification algorithms which work on uncompressed text documents. Since web pages contain text data which are stored in compressed format and the text documents must be taken back to its original format for the purpose of data mining activities. The process of decompression of text documents consumes more computational time. So this work introduces a study on different text classification and clustering algorithms and their comparison in compressed domain. Various methods for representing text in compressed domain are explained and experiments are conducted on LZW method for comparison. Different classification and clustering algorithms are also discussed. A comparative analysis on all these methods is presented. © 2015, Research India Publications, All rights Reserved.