DocumentCode
870006
Title
Machine printed text and handwriting identification in noisy document images
Author
Zheng, Yefeng ; Li, Huiping ; Doermann, David
Author_Institution
Inst. for Adv. Comput. Studies, Maryland Univ., College Park, MD, USA
Volume
26
Issue
3
fYear
2004
fDate
3/1/2004 12:00:00 AM
Firstpage
337
Lastpage
353
Abstract
In this paper, we address the problem of the identification of text in noisy document images. We are especially focused on segmenting and identifying between handwriting and machine printed text because: 1) Handwriting in a document often indicates corrections, additions, or other supplemental information that should be treated differently from the main content and 2) the segmentation and recognition techniques requested for machine printed and handwritten text are significantly different. A novel aspect of our approach is that we treat noise as a separate class and model noise based on selected features. Trained Fisher classifiers are used to identify machine printed text and handwriting from noise and we further exploit context to refine the classification. A Markov Random Field-based (MRF) approach is used to model the geometrical structure of the printed text, handwriting, and noise to rectify misclassifications. Experimental results show that our approach is robust and can significantly improve page segmentation in noisy document collections.
Keywords
Markov processes; document image processing; feature extraction; handwriting recognition; image enhancement; image segmentation; text analysis; Markov random field; fisher classifiers; handwriting identification; machine printed text; noisy document images; page segmentation; recognition techniques; text identification; Context modeling; Handwriting recognition; Image analysis; Image enhancement; Image segmentation; Markov random fields; Noise robustness; Solid modeling; Text analysis; Text recognition; Algorithms; Artificial Intelligence; Automatic Data Processing; Computer Graphics; Documentation; Image Enhancement; Image Interpretation, Computer-Assisted; Information Storage and Retrieval; Models, Statistical; Numerical Analysis, Computer-Assisted; Pattern Recognition, Automated; Reading; Reproducibility of Results; Sensitivity and Specificity; Signal Processing, Computer-Assisted; Stochastic Processes; Subtraction Technique; User-Computer Interface; Writing;
fLanguage
English
Journal_Title
Pattern Analysis and Machine Intelligence, IEEE Transactions on
Publisher
ieee
ISSN
0162-8828
Type
jour
DOI
10.1109/TPAMI.2004.1262324
Filename
1262324
Link To Document