Publication Type : Conference Proceedings
Publisher : Fourth International Conference on Intelligent Sensing and Information Processing
Source : 2006 Fourth International Conference on Intelligent Sensing and Information Processing, p.53-58 (2006)
Campus : Bengaluru
School : Department of Computer Science and Engineering, School of Engineering
Department : Computer Science
Year : 2006
Abstract : Separation of printed text blocks from the non-text areas, containing signatures, handwritten text, logos and other such symbols, is a necessary first step for an OCR involving printed text recognition. In the present work, we compare the efficacy of some feature-classifier combinations to carry out this separation task. We have selected length-normalized horizontal projection profile (HPP) as the starting point of such a separation task. This is with the assumption that the printed text blocks contain lines of text which generate HPP's with some regularity. Such an assumption is demonstrated to be valid. Our features are the HPP and its two transformed versions, namely, eigen and Fisher profiles. Four well known classifiers, namely, nearest neighbor, linear discriminant function, SVM's and artificial neural networks have been considered and efficiency of the combination of these classifiers with the above features is compared. A sequential floating feature selection technique has been adopted to enhance the efficiency of this separation task. The results give an average accuracy of about 96%.
Cite this Research Publication : K. R. Arvind, Peeta Basa Pati, and Ramakrishnan, A. G., “Automatic text block separation in document images”, 2006 Fourth International Conference on Intelligent Sensing and Information Processing. pp. 53-58, 2006.