• DocumentCode
    2142900
  • Title

    Novel Data Representation for Text Extraction from Multispectral Historical Document Images

  • Author

    Hedjam, Rachid ; Cheriet, Mohamed

  • Author_Institution
    Synchromedia Lab. for Multemedia Commun. in Telepresence, Montreal, QC, Canada
  • fYear
    2011
  • fDate
    18-21 Sept. 2011
  • Firstpage
    172
  • Lastpage
    176
  • Abstract
    The extraction and analysis of useful information from old document images is very important into cultural heritage preservation. In advanced research, where the goal is to separate the foreground (in general, text) from the background, image restoration and pattern classification techniques are used. Most of these methods consist of classifying the pixels based on their gray-scale value. In this paper, we propose to perform foreground pattern extraction using regions-of-interest (ROI) analysis and a maximum likelihood classifier designed for multispectral document images. As contribution, a new feature vector is proposed to improve discrimination between patterns that is embedded in a simple statistical classification method. The results, which are promising, are compared to the state-of-the-art.
  • Keywords
    document image processing; feature extraction; image classification; maximum likelihood estimation; statistical analysis; cultural heritage preservation; data representation; feature vector; foreground pattern extraction; maximum likelihood classifier; multispectral historical document image; regions-of-interest analysis; statistical classification method; text extraction; Cultural differences; Data mining; Feature extraction; Imaging; Materials; Vectors; Degraded document images; Document image binarization; Feature vector; Historical document images; Multispectral imaging; Pattern persistence; Tensor-based energy;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition (ICDAR), 2011 International Conference on
  • Conference_Location
    Beijing
  • ISSN
    1520-5363
  • Print_ISBN
    978-1-4577-1350-7
  • Electronic_ISBN
    1520-5363
  • Type

    conf

  • DOI
    10.1109/ICDAR.2011.43
  • Filename
    6065298