DocumentCode
168278
Title
Bridging the gap between real world repositories and Scalable Preservation Environments
Author
Jurik, Bolette Ammitzboll ; Blekinge, Asger Askov ; Ferneke-Nielsen, Rune Bruun ; Moldrup-Dalum, Per
Author_Institution
State & Univ. Libr., Aarhus, Denmark
fYear
2014
fDate
8-12 Sept. 2014
Firstpage
127
Lastpage
136
Abstract
Integrating large scale processing environments, such as Hadoop, with traditional repository systems, such as Fedora Commons 3, have long proved a daunting task. In this paper we show how this integration can be achieved using software developed in the SCAPE project. The SCAPE integration is based on four steps: retrieving the metadata records from the repository, reading the records and their references to data files, updating the records, and storing them back in the repository. This allows full use of the Hadoop system for massively distributed processing without causing excessive load on the repository.
Keywords
distributed processing; meta data; software engineering; Hadoop system; SCAPE project; daunting task; distributed processing; large scale processing environments; meta data records; real world repositories; scalable preservation environments; software development; Connectors; Data models; Educational institutions; Feature extraction; Libraries; Servers; TV; Apache Hadoop; Digital Preservation; Digital Repository; File Characterisation; Integration; JPEG 2000; Preservation Action; Preservation Policies; Scalability;
fLanguage
English
Publisher
ieee
Conference_Titel
Digital Libraries (JCDL), 2014 IEEE/ACM Joint Conference on
Conference_Location
London
Type
conf
DOI
10.1109/JCDL.2014.6970158
Filename
6970158
Link To Document