DocumentCode
3497906
Title
The case of the digitized works at a National Digital Library
Author
Borbinha, José ; Gil, João ; Pedrosa, Gilberto ; Penas, João
Author_Institution
INESC-ID, Inst. de Engenharia de Sistemas e Computadores, Lisboa
fYear
2006
fDate
27-28 April 2006
Lastpage
125
Abstract
This paper describes the case of the processing of digitised works at the BND - National Digital Library, in Portugal. This initiative created half a million of digitized images, from 25,000 titles of physical items. These represent a very heterogeneous sample of historical or more relevant items (printed monographic and newspapers, maps, manuscripts, drawings, etc.). The digitisation resulted in TIFF files, which need to be automatically processed to create the technical metadata, apply image processing actions, OCR, word indexing, and create derived copies for access in PNG, JPG, GIF, and PDF, as also the master copies for each of those works, for preservation. That process is described in this paper. It is fully automated through several XML schemas for the control of the processes, description of the results (including the OCR outputs), descriptive metadata (in Dublin Core, MARC XML, etc.) and rights and structural metadata (in METS)
Keywords
digital libraries; document image processing; image coding; indexing; GIF; JPG; National Digital Library; OCR; PDF; PNG; TIFF files; XML schemas; descriptive metadata; digitized images; digitized works; image processing; rights metadata; structural metadata; technical metadata; word indexing; Automatic control; Computer aided software engineering; Gas insulated transmission lines; Image processing; Image storage; Indexing; Optical character recognition software; Process control; Software libraries; XML;
fLanguage
English
Publisher
ieee
Conference_Titel
Document Image Analysis for Libraries, 2006. DIAL '06. Second International Conference on
Conference_Location
Lyon
Print_ISBN
0-7695-2531-8
Type
conf
DOI
10.1109/DIAL.2006.42
Filename
1612954
Link To Document