• DocumentCode
    1579906
  • Title

    Fine-grained document genre classification using first order random graphs

  • Author

    Bagdanov, Andrew D. ; Worring, Marcel

  • Author_Institution
    Intelligent Sensory Inf. Syst., Amsterdam Univ., Netherlands
  • fYear
    2001
  • fDate
    6/23/1905 12:00:00 AM
  • Firstpage
    79
  • Lastpage
    83
  • Abstract
    We approach the general problem of classifying machine-printed documents into genres. Layout is a critical factor in recognizing fine-grained genres, as document content features are similar. Document genre is determined from the layout structure detected from scanned binary images of the document pages, using no OCR results and minimal a priori knowledge of document logical structures. Our method uses the attributed relational graphs (ARGs) to represent the layout structure of document instances, and the first order random graphs (FORGs) to represent document genres. In this paper we develop our FORG-based genre classification method and present a comparative evaluation between our technique and a variety of statistical pattern classifiers. FORGs are capable of modeling common layout structure within a document genre and are shown to significantly outperform traditional pattern classification techniques when fine-grained genre distinctions must be drawn
  • Keywords
    document image processing; graph theory; pattern classification; probability; attributed relational graphs; document genre classification; document image understanding; first order random graphs; pattern classification; probability distribution; Automation; Data mining; Information analysis; Information systems; Intelligent sensors; Intelligent systems; Machine intelligence; Optical character recognition software; Performance analysis; Text analysis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition, 2001. Proceedings. Sixth International Conference on
  • Conference_Location
    Seattle, WA
  • Print_ISBN
    0-7695-1263-1
  • Type

    conf

  • DOI
    10.1109/ICDAR.2001.953759
  • Filename
    953759