DocumentCode :
1635649
Title :
Application of Multi-Level Classifiers and Clustering for Automatic Word Spotting in Historical Document Images
Author :
Moghaddam, Reza Farrahi ; Cheriet, Mohamed
Author_Institution :
Synchromedia Lab. for Multimedia Commun. in Telepresence, Ecole de Technol. Super., Montreal, QC, Canada
fYear :
2009
Firstpage :
511
Lastpage :
515
Abstract :
A complete system for preprocessing and word spotting of very old historical document images is presented. Document images are processed for extraction of salient information using a word spotting technique which does not need line and word segmentation and is language independent.A multi-class library of connected components of document text is created based on six features. The spotting is performed using Euclidean distance measure enhanced by rotation and dynamic time wrapping transforms. The method is applied to a dataset from Juma Al Majid Center (Dubai)with promising results. A promising performance of the word spotting technique is obtained using an automatic preprocessing stage. In this stage, using content-level classifiers, accurate stroke pixels are extracted in a robust way. The preprocessed document images are also more legible to the end user and are less costly to archive and transfer.
Keywords :
document image processing; feature extraction; history; image classification; pattern clustering; Euclidean distance measure; Juma Al Majid Center; clustering technique; content-level classifier; dynamic time wrapping transform; feature extraction; historical document image; word spotting technique; Euclidean distance; Image analysis; Image recognition; Image restoration; Image segmentation; Laboratories; Multimedia communication; Robustness; Text analysis; Wrapping;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis and Recognition, 2009. ICDAR '09. 10th International Conference on
Conference_Location :
Barcelona
ISSN :
1520-5363
Print_ISBN :
978-1-4244-4500-4
Electronic_ISBN :
1520-5363
Type :
conf
DOI :
10.1109/ICDAR.2009.104
Filename :
5277605
Link To Document :
بازگشت