DocumentCode
1635649
Title
Application of Multi-Level Classifiers and Clustering for Automatic Word Spotting in Historical Document Images
Author
Moghaddam, Reza Farrahi ; Cheriet, Mohamed
Author_Institution
Synchromedia Lab. for Multimedia Commun. in Telepresence, Ecole de Technol. Super., Montreal, QC, Canada
fYear
2009
Firstpage
511
Lastpage
515
Abstract
A complete system for preprocessing and word spotting of very old historical document images is presented. Document images are processed for extraction of salient information using a word spotting technique which does not need line and word segmentation and is language independent.A multi-class library of connected components of document text is created based on six features. The spotting is performed using Euclidean distance measure enhanced by rotation and dynamic time wrapping transforms. The method is applied to a dataset from Juma Al Majid Center (Dubai)with promising results. A promising performance of the word spotting technique is obtained using an automatic preprocessing stage. In this stage, using content-level classifiers, accurate stroke pixels are extracted in a robust way. The preprocessed document images are also more legible to the end user and are less costly to archive and transfer.
Keywords
document image processing; feature extraction; history; image classification; pattern clustering; Euclidean distance measure; Juma Al Majid Center; clustering technique; content-level classifier; dynamic time wrapping transform; feature extraction; historical document image; word spotting technique; Euclidean distance; Image analysis; Image recognition; Image restoration; Image segmentation; Laboratories; Multimedia communication; Robustness; Text analysis; Wrapping;
fLanguage
English
Publisher
ieee
Conference_Titel
Document Analysis and Recognition, 2009. ICDAR '09. 10th International Conference on
Conference_Location
Barcelona
ISSN
1520-5363
Print_ISBN
978-1-4244-4500-4
Electronic_ISBN
1520-5363
Type
conf
DOI
10.1109/ICDAR.2009.104
Filename
5277605
Link To Document