DocumentCode :
3497906
Title :
The case of the digitized works at a National Digital Library
Author :
Borbinha, José ; Gil, João ; Pedrosa, Gilberto ; Penas, João
Author_Institution :
INESC-ID, Inst. de Engenharia de Sistemas e Computadores, Lisboa
fYear :
2006
fDate :
27-28 April 2006
Lastpage :
125
Abstract :
This paper describes the case of the processing of digitised works at the BND - National Digital Library, in Portugal. This initiative created half a million of digitized images, from 25,000 titles of physical items. These represent a very heterogeneous sample of historical or more relevant items (printed monographic and newspapers, maps, manuscripts, drawings, etc.). The digitisation resulted in TIFF files, which need to be automatically processed to create the technical metadata, apply image processing actions, OCR, word indexing, and create derived copies for access in PNG, JPG, GIF, and PDF, as also the master copies for each of those works, for preservation. That process is described in this paper. It is fully automated through several XML schemas for the control of the processes, description of the results (including the OCR outputs), descriptive metadata (in Dublin Core, MARC XML, etc.) and rights and structural metadata (in METS)
Keywords :
digital libraries; document image processing; image coding; indexing; GIF; JPG; National Digital Library; OCR; PDF; PNG; TIFF files; XML schemas; descriptive metadata; digitized images; digitized works; image processing; rights metadata; structural metadata; technical metadata; word indexing; Automatic control; Computer aided software engineering; Gas insulated transmission lines; Image processing; Image storage; Indexing; Optical character recognition software; Process control; Software libraries; XML;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Image Analysis for Libraries, 2006. DIAL '06. Second International Conference on
Conference_Location :
Lyon
Print_ISBN :
0-7695-2531-8
Type :
conf
DOI :
10.1109/DIAL.2006.42
Filename :
1612954
Link To Document :
بازگشت