Publication Type : Journal Article
Publisher : Journal of Theoretical and Applied Information Technology
Source : Journal of Theoretical and Applied Information Technology, Volume 86, Issue 2, Pages 223-231, 20 April 2016.
Url : https://www.scopus.com/inward/record.uri?eid=2-s2.0-84964253491&partnerID=40&md5=b65d99877a56916f7931aae90212f416
Keywords : Irrelevant information, Morphological operations, Non-scratched words, Pre-printed forms, Scratched words, Unsupervised learning
Campus : Mysuru
School : School of Arts and Sciences
Center : Technology Enabling Centre
Department : Computer Science
Year : 2016
Abstract : pPre-processing of document images is the most variant factor from one type of document image to another. In general, especially document images require more intensive pre-processing procedures than other type of images; one of such categories is pre-printed form images. Pre-processing of such documents is different from other type of images containing simple text and free from graphical components. This paper proposes a generic pre-processing algorithm adaptable for pre-printed application form images. The work supports specifically on problem of detection and removal of scratched words inherent in the text, since these elements are interpreted neither by humans nor by machines. The algorithm exploits the features like Euler’s number, number of connected components and area covered by holes with in a text block for detection of scratched out text blocks. The algorithm has yielded reasonably good results with an overall efficacy of around 96.5%. © 2005 - 2016 JATIT amp; LLS. All rights reserved./p
Cite this Research Publication : Shobha Rani, N., Vasudev, T., Vineeth, P., Ajith, D., "An unsupervised classification technique for recognition of scratched and non-scratched words in pre-printed documents," Journal of Theoretical and Applied Information Technology, 86 (2), pp. 223-231, 2016.