Author_Institution :
Nat. Libr. of Med., Bethesda, MD, USA
Abstract :
Summary form only given. In response to a crisis in the data entry of citations and abstracts of medical journal articles for the MEDLINE database, the authors developed a system, code-named MARS (Medical Article Record System), that combines keyboarding of citation data with scanning and OCR text converting abstracts. MARS consists of multiple workstations of three types (scanner, proofing and editing) and keyboarded citation entry. In addition, the system requires three servers: a network file server, an OCR server and one to match double-keyboarded citations. The performance of six OCR packages were compared in terms of detected errors, highlighted correct words and undetected errors. The selection of the OCR package for the MARS system was based on minimizing the undetected error rate. The way in which the MARS system works is described. All workstations are networked via a LAN. A year after the system was placed in operation, it was providing over 20% of the total entry requirements of the National Library of Medicine (NLM). Our current work is to design a database-centered system that will provide more comprehensive automation and lower per-unit cost, by incorporating subsystems for auto-zoning, page segmentation, automatic field identification and automated syntax reformatting
Keywords :
abstracting; bibliographic systems; image segmentation; local area networks; medical information systems; network servers; optical character recognition; software performance evaluation; software selection; LAN; MARS; MEDLINE database; Medical Article Record System; National Library of Medicine; OCR scanning; OCR server; abstracts; auto-zoning; automated syntax reformatting; automatic data entry; automatic field identification; biomedical databases; citation data; detected errors; double-keyboarded citations; highlighted correct words; keyboarded citation entry; keyboarding; medical journal articles; network file server; page segmentation; per-unit cost; software performance; subsystems; undetected error rate minimization; workstations; Abstracts; Databases; Error analysis; Error correction; File servers; Mars; Network servers; Optical character recognition software; Packaging; Workstations;