Author :
Borbinha, José ; Gil, João ; Pedrosa, Gilberto ; Penas, João
Abstract :
This paper describes the case of the processing of digitised works at the BND - National Digital Library, in Portugal. This initiative created half a million of digitized images, from 25,000 titles of physical items. These represent a very heterogeneous sample of historical or more relevant items (printed monographic and newspapers, maps, manuscripts, drawings, etc.). The digitisation resulted in TIFF files, which need to be automatically processed to create the technical metadata, apply image processing actions, OCR, word indexing, and create derived copies for access in PNG, JPG, GIF, and PDF, as also the master copies for each of those works, for preservation. That process is described in this paper. It is fully automated through several XML schemas for the control of the processes, description of the results (including the OCR outputs), descriptive metadata (in Dublin Core, MARC XML, etc.) and rights and structural metadata (in METS)
Keywords :
digital libraries; document image processing; image coding; indexing; GIF; JPG; National Digital Library; OCR; PDF; PNG; TIFF files; XML schemas; descriptive metadata; digitized images; digitized works; image processing; rights metadata; structural metadata; technical metadata; word indexing; Automatic control; Computer aided software engineering; Gas insulated transmission lines; Image processing; Image storage; Indexing; Optical character recognition software; Process control; Software libraries; XML;