Publication Type : Book Chapter
Publisher : IEEE
Source : International Conference on Computing Communication and Networking Technologies (ICCCNT)
Url :
Campus : Amritapuri
School : School of Computing
Year : 2021
Abstract : OCR, an acronym for “Optical Character Recognition” is a system that automatically grabs information one needs from scanned images of typewritten or printed text by translating them into machine-encoded text. OCR today is embedded in many applications, websites, etc., but most of these systems operate for Latin-based scripts such as Roman and English. India is a multilingual country with more than 19,500 languages or dialects spoken as mother tongues. Due to this diversity, many works are not reported in Indian languages. Most of the Indian language has large character sets that are complex in structure compared to Latin-based scripts. Transfer learning of Latin-based OCR systems to Telugu is hence a difficult undertaking. Neural networks are best equipped to meet the difficulty of Telugu OCR. This work aims to develop a multilingual translation OCR system that can recognize the basic printed texts of Telugu scripts.
Cite this Research Publication : Abhishek, Baratam Vijaya Sai, K. Yamuna, and T. Anjali. "Multilingual translational optical character recognition system for printed Telugu text." In 2021 12th International Conference on Computing Communication and Networking Technologies (ICCCNT), pp. 1-5. IEEE, 2021.