• DocumentCode
    2012305
  • Title

    Parsing Tables by Probabilistic Modeling of Perceptual Cues

  • Author

    Bart, Evgeniy

  • Author_Institution
    Intell. Syst. Lab., Palo Alto Res. Center, Palo Alto, CA, USA
  • fYear
    2012
  • fDate
    27-29 March 2012
  • Firstpage
    409
  • Lastpage
    414
  • Abstract
    In this paper, we propose a method for automatically parsing images of tables, focusing in particular on `simple´ matrix-like tables with rectilinear layout. Such tables account for over 50% of tables in business documents. The main novelty of the proposed method is that it combines intrinsic properties of table cells with properties of cell separators, as well as table rows, columns, and layout, in a single global objective function. This is in contrast to previous methods which focused on either separators alone or intrinsic cell properties alone. Our method uses a variety of perceptual cues, such as alignment and saliency, to characterize these properties. Candidate parses are evaluated by comparing their likelihoods, and the parse that optimizes the likelihood is selected. The proposed approach deals successfully with a wide variety of tables, as illustrated on a dataset of over 1,000 images.
  • Keywords
    document image processing; grammars; probability; alignment; automatic parsing table images; business documents; cell separators; global objective function; likelihood optimization; matrix-like tables; perceptual cues; probabilistic modeling; saliency; table cell intrinsic properties; table columns; table rectilinear layout; table rows; Databases; Feature extraction; Layout; Optimization; Particle separators; Testing; Training; document analysis; table parsing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis Systems (DAS), 2012 10th IAPR International Workshop on
  • Conference_Location
    Gold Cost, QLD
  • Print_ISBN
    978-1-4673-0868-7
  • Type

    conf

  • DOI
    10.1109/DAS.2012.67
  • Filename
    6195404