• DocumentCode
    2192381
  • Title

    The retrieval of document images: a brief survey

  • Author

    Doermann, David

  • Author_Institution
    Language & Media Process. Lab., Maryland Univ., College Park, MD, USA
  • Volume
    2
  • fYear
    1997
  • fDate
    18-20 Aug 1997
  • Firstpage
    945
  • Abstract
    The economic feasibility of creating large databases of document images has left a tremendous need for robust ways to access the information these images contain. Printed documents are often scanned for archiving or an an attempt to move toward a paper-less office and stored as images, but without adequate index information. In order to make full use of the capabilities of traditional database indexing and retrieval techniques, a full conversion of the document may be required. There are many factors, however, which may prohibit complete conversion including its high cost, insufficient document quality, or the fact that parts of the document can simply not be adequately represented in a converted form. In this paper, we provide a survey of methods developed by researchers to access document images without relying on complete and accurate conversion. We briefly discuss traditional text indexing techniques on imperfect data and the retrieval of partially converted documents, followed by a more complete review of techniques for the direct retrieval and characterization of document images including text, drawings and graphics
  • Keywords
    document image processing; indexing; information retrieval; visual databases; database indexing; document images; large databases; retrieval; text indexing; Costs; Educational institutions; Graphics; Image converters; Image databases; Image retrieval; Indexes; Indexing; Information retrieval; Laboratories;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition, 1997., Proceedings of the Fourth International Conference on
  • Conference_Location
    Ulm
  • Print_ISBN
    0-8186-7898-4
  • Type

    conf

  • DOI
    10.1109/ICDAR.1997.620650
  • Filename
    620650