DocumentCode :
2021012
Title :
Document Content Inventory and Retrieval
Author :
Baird, Henry S. ; Moll, Michael A.
Author_Institution :
Lehigh Univ., Bethlehem
Volume :
1
fYear :
2007
fDate :
23-26 Sept. 2007
Firstpage :
93
Lastpage :
97
Abstract :
We give an analysis of relationships between expected retrieval performance and classification recognition accuracy in the context of document image content extraction and inventory. By content extraction we mean location and measurement of regions containing handwriting, machine- printed text, photographs, blank space, etc, in documents represented as bilevel, grey-level, or color images. Recent experiments have shown that even modest per-pixel content classification accuracies can support usefully high recall and precision rates (of, e.g., 80-90%) for retrieval queries within document collections seeking pages that contain a given minimum fraction of a certain type of content. In an effort to elucidate this interesting empirical result, we have analyzed the interdependency of classification and retrieval under a variety of assumptions about the distribution of content types in document image collections. We show that under general conditions we cannot derive precision and recall measures from per-pixel classification measures alone, but we can estimate the expected values of these measures. If however the distribution of content and error rates are uniform across the entire collection, our results suggest, it is possible to predict precision and recall measures from classification accuracy and vice versa.
Keywords :
content-based retrieval; document image processing; information retrieval; pattern classification; classification recognition accuracy; content classification accuracy; document content inventory; document image collections; document image content extraction; document retrieval; retrieval performance; retrieval query; Content based retrieval; Image analysis; Image color analysis; Image recognition; Image retrieval; Information retrieval; Performance analysis; Pixel; Testing; Text analysis;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis and Recognition, 2007. ICDAR 2007. Ninth International Conference on
Conference_Location :
Parana
ISSN :
1520-5363
Print_ISBN :
978-0-7695-2822-9
Type :
conf
DOI :
10.1109/ICDAR.2007.4378682
Filename :
4378682
Link To Document :
بازگشت