• DocumentCode
    1635649
  • Title

    Application of Multi-Level Classifiers and Clustering for Automatic Word Spotting in Historical Document Images

  • Author

    Moghaddam, Reza Farrahi ; Cheriet, Mohamed

  • Author_Institution
    Synchromedia Lab. for Multimedia Commun. in Telepresence, Ecole de Technol. Super., Montreal, QC, Canada
  • fYear
    2009
  • Firstpage
    511
  • Lastpage
    515
  • Abstract
    A complete system for preprocessing and word spotting of very old historical document images is presented. Document images are processed for extraction of salient information using a word spotting technique which does not need line and word segmentation and is language independent.A multi-class library of connected components of document text is created based on six features. The spotting is performed using Euclidean distance measure enhanced by rotation and dynamic time wrapping transforms. The method is applied to a dataset from Juma Al Majid Center (Dubai)with promising results. A promising performance of the word spotting technique is obtained using an automatic preprocessing stage. In this stage, using content-level classifiers, accurate stroke pixels are extracted in a robust way. The preprocessed document images are also more legible to the end user and are less costly to archive and transfer.
  • Keywords
    document image processing; feature extraction; history; image classification; pattern clustering; Euclidean distance measure; Juma Al Majid Center; clustering technique; content-level classifier; dynamic time wrapping transform; feature extraction; historical document image; word spotting technique; Euclidean distance; Image analysis; Image recognition; Image restoration; Image segmentation; Laboratories; Multimedia communication; Robustness; Text analysis; Wrapping;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition, 2009. ICDAR '09. 10th International Conference on
  • Conference_Location
    Barcelona
  • ISSN
    1520-5363
  • Print_ISBN
    978-1-4244-4500-4
  • Electronic_ISBN
    1520-5363
  • Type

    conf

  • DOI
    10.1109/ICDAR.2009.104
  • Filename
    5277605