• DocumentCode
    2639464
  • Title

    Document image summarization without OCR

  • Author

    Bloomberg, Dan S. ; Chen, Francine R.

  • Author_Institution
    Xerox Palo Alto Res. Center, CA, USA
  • Volume
    1
  • fYear
    1996
  • fDate
    16-19 Sep 1996
  • Firstpage
    229
  • Abstract
    A system for selecting excerpts directly from imaged text without performing optical character recognition is described. The images are segmented to find text regions, text lines and words, and sentence and paragraph boundaries are identified. A set of word equivalence classes is computed based on the rank blur hit-miss transform. This information is used to identify stop words and keywords. Sentences for presentation as part of a summary are then selected based on keywords and on the location of the sentences
  • Keywords
    document image processing; image segmentation; transforms; document image summarization; image segmentation; imaged text; keywords; paragraph boundaries; rank blur hit-miss transform; sentence; stop word identification; text lines; text regions; word equivalence classes; words; Character generation; Character recognition; Data mining; Graphics; Image analysis; Image processing; Image segmentation; Natural languages; Optical character recognition software; Shape;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Image Processing, 1996. Proceedings., International Conference on
  • Conference_Location
    Lausanne
  • Print_ISBN
    0-7803-3259-8
  • Type

    conf

  • DOI
    10.1109/ICIP.1996.560744
  • Filename
    560744