Title :
Application of Multi-Level Classifiers and Clustering for Automatic Word Spotting in Historical Document Images
Author :
Moghaddam, Reza Farrahi ; Cheriet, Mohamed
Author_Institution :
Synchromedia Lab. for Multimedia Commun. in Telepresence, Ecole de Technol. Super., Montreal, QC, Canada
Abstract :
A complete system for preprocessing and word spotting of very old historical document images is presented. Document images are processed for extraction of salient information using a word spotting technique which does not need line and word segmentation and is language independent.A multi-class library of connected components of document text is created based on six features. The spotting is performed using Euclidean distance measure enhanced by rotation and dynamic time wrapping transforms. The method is applied to a dataset from Juma Al Majid Center (Dubai)with promising results. A promising performance of the word spotting technique is obtained using an automatic preprocessing stage. In this stage, using content-level classifiers, accurate stroke pixels are extracted in a robust way. The preprocessed document images are also more legible to the end user and are less costly to archive and transfer.
Keywords :
document image processing; feature extraction; history; image classification; pattern clustering; Euclidean distance measure; Juma Al Majid Center; clustering technique; content-level classifier; dynamic time wrapping transform; feature extraction; historical document image; word spotting technique; Euclidean distance; Image analysis; Image recognition; Image restoration; Image segmentation; Laboratories; Multimedia communication; Robustness; Text analysis; Wrapping;
Conference_Titel :
Document Analysis and Recognition, 2009. ICDAR '09. 10th International Conference on
Conference_Location :
Barcelona
Print_ISBN :
978-1-4244-4500-4
Electronic_ISBN :
1520-5363
DOI :
10.1109/ICDAR.2009.104