DocumentCode :
2900627
Title :
Understanding smeared documents
Author :
Johnson, R.B.
Author_Institution :
Dept. of Electr. & Electron. Eng., Bristol Univ., UK
fYear :
1999
fDate :
1999
Firstpage :
42401
Lastpage :
42406
Abstract :
The paper investigates strategies for handling smeared documents. Smeared documents are considered documents that acquired imprints from sources not included in the original preparation of the document or as a result of the limitations of the conversion process to digital format. For example, smudges, blemishes or dirt from manhandling, neglect or other sources such as base of coffee mugs. It is suggested that a template of the original document serves as the basis for recognition of the class of document such as a form or a multi-column page of text. Classification can take place according to the layout form and textual content. The modules that could form such a system could include, deskewing, text/graphics segmentation, OCR, page layout and document understanding. The paper highlights a few of the strategies that have been proposed to carry out these tasks. For example, hierarchical Hough transform is suggested for deskewing as it is robust against noise. An optimised segmentation scheme has been proposed to produce partitioned blocks and classified in a goal driven manner using a decision tree. The paper also highlights the limitations of the current system that converts scanned schematics to textual description files. Despite originally being designed for interpreting circuit diagrams, it is argued that it can be adapted to form reader, provided a document type classifier is implemented
Keywords :
document image processing; OCR; decision tree; deskewing; document template; document type classifier; form reader; hierarchical Hough transform; layout form; multi-column page; optimised segmentation scheme; page layout; partitioned blocks; scanned schematics; smeared document understanding; text/graphics segmentation; textual content; textual description files;
fLanguage :
English
Publisher :
iet
Conference_Titel :
Document Image Processing and Multimedia (Ref. No. 1999/041), IEE Colloquium on
Conference_Location :
London
Type :
conf
DOI :
10.1049/ic:19990202
Filename :
773123
Link To Document :
بازگشت