DocumentCode
2639464
Title
Document image summarization without OCR
Author
Bloomberg, Dan S. ; Chen, Francine R.
Author_Institution
Xerox Palo Alto Res. Center, CA, USA
Volume
1
fYear
1996
fDate
16-19 Sep 1996
Firstpage
229
Abstract
A system for selecting excerpts directly from imaged text without performing optical character recognition is described. The images are segmented to find text regions, text lines and words, and sentence and paragraph boundaries are identified. A set of word equivalence classes is computed based on the rank blur hit-miss transform. This information is used to identify stop words and keywords. Sentences for presentation as part of a summary are then selected based on keywords and on the location of the sentences
Keywords
document image processing; image segmentation; transforms; document image summarization; image segmentation; imaged text; keywords; paragraph boundaries; rank blur hit-miss transform; sentence; stop word identification; text lines; text regions; word equivalence classes; words; Character generation; Character recognition; Data mining; Graphics; Image analysis; Image processing; Image segmentation; Natural languages; Optical character recognition software; Shape;
fLanguage
English
Publisher
ieee
Conference_Titel
Image Processing, 1996. Proceedings., International Conference on
Conference_Location
Lausanne
Print_ISBN
0-7803-3259-8
Type
conf
DOI
10.1109/ICIP.1996.560744
Filename
560744
Link To Document