Title :
An extended method for recognition of broken typewritten characters special reference to tamil script
Author :
Abubacker, Nirase Fathima ; Gandhi, Raman Indra
Author_Institution :
Sch. of Inf. Technol., City Univ. Coll. of Sci. & Technol., Kuala Lumpur, Malaysia
Abstract :
Preparing clean and clear images for the recognition engines is often taken for granted as a trivial task that requires little attention. Most of the existing OCRs have been designed in such a way that which correctly identify fine printed documents in all scripts. The performance of standard machine printed OCR system works fails, if it is tested on documents with distorted characters. This paper presents an approach to overcome the difficulties presented in such distorted type written documents especially with broken characters. As a first step, isolation of character is forwarded using character position location and character localization and enclosing it in a matrix which will be analyzing and repairing in the later part of our study. An attempt is incorporated using shape and line tracing method for recognition of distorted broken characters and then it is fine tuned by lexical knowledge.
Keywords :
document image processing; natural language processing; optical character recognition; OCR; broken typewritten characters special reference; character localization; character position location; distorted characters; extended method; lexical knowledge; optical character recognition; printed documents; tamil script; Accuracy; Character recognition; Conferences; Feature extraction; Open systems; Shape; Support vector machine classification; Broken Tamil; Distorted Characters; Line Tracing; Localization; Shape Tracing;
Conference_Titel :
Open Systems (ICOS), 2011 IEEE Conference on
Conference_Location :
Langkawi
Print_ISBN :
978-1-61284-931-7
DOI :
10.1109/ICOS.2011.6079265