• DocumentCode
    3020648
  • Title

    Evaluation of a user-assisted archive construction system for online natural history archives

  • Author

    He, J. ; Downton, A.C.

  • Author_Institution
    Dept. of Electron. Syst. Eng., Essex Univ., Colchester, UK
  • fYear
    2005
  • fDate
    29 Aug.-1 Sept. 2005
  • Firstpage
    442
  • Abstract
    The creation of structured digital libraries from paper-based archives is an area of growing demand in many scientific and cultural fields, and is not satisfied either by off-the-shelf OCR or commercial form-processing systems. This paper describes and evaluates a configurable archive construction system, which integrates document image pre-processing and analysis with text post-processing tools and a standard OCR package. The prototype system is currently being used in conjunction with the UK Natural History Museum to help convert more than 500,000 cards of Lepidoptera and Coleoptera to a searchable digital archive. Evaluation results are summarised for two datasets comprising over 5,000 cards selected from different parts of this database, and indicate that overall end-to-end word recognition rates of 70-90% are readily achievable for key data fields, subject to availability of suitable electronic dictionaries.
  • Keywords
    digital libraries; document image processing; history; optical character recognition; text analysis; UK Natural History Museum; digital libraries; document image analysis; document image pre-processing; online natural history archives; user-assisted archive construction system; Cultural differences; Databases; Dictionaries; History; Image analysis; Image converters; Optical character recognition software; Packaging; Prototypes; Software libraries;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition, 2005. Proceedings. Eighth International Conference on
  • ISSN
    1520-5263
  • Print_ISBN
    0-7695-2420-6
  • Type

    conf

  • DOI
    10.1109/ICDAR.2005.107
  • Filename
    1575585