Title :
Bridging the gap between real world repositories and Scalable Preservation Environments
Author :
Jurik, Bolette Ammitzboll ; Blekinge, Asger Askov ; Ferneke-Nielsen, Rune Bruun ; Moldrup-Dalum, Per
Author_Institution :
State & Univ. Libr., Aarhus, Denmark
Abstract :
Integrating large scale processing environments, such as Hadoop, with traditional repository systems, such as Fedora Commons 3, have long proved a daunting task. In this paper we show how this integration can be achieved using software developed in the SCAPE project. The SCAPE integration is based on four steps: retrieving the metadata records from the repository, reading the records and their references to data files, updating the records, and storing them back in the repository. This allows full use of the Hadoop system for massively distributed processing without causing excessive load on the repository.
Keywords :
distributed processing; meta data; software engineering; Hadoop system; SCAPE project; daunting task; distributed processing; large scale processing environments; meta data records; real world repositories; scalable preservation environments; software development; Connectors; Data models; Educational institutions; Feature extraction; Libraries; Servers; TV; Apache Hadoop; Digital Preservation; Digital Repository; File Characterisation; Integration; JPEG 2000; Preservation Action; Preservation Policies; Scalability;
Conference_Titel :
Digital Libraries (JCDL), 2014 IEEE/ACM Joint Conference on
Conference_Location :
London
DOI :
10.1109/JCDL.2014.6970158