• DocumentCode
    168278
  • Title

    Bridging the gap between real world repositories and Scalable Preservation Environments

  • Author

    Jurik, Bolette Ammitzboll ; Blekinge, Asger Askov ; Ferneke-Nielsen, Rune Bruun ; Moldrup-Dalum, Per

  • Author_Institution
    State & Univ. Libr., Aarhus, Denmark
  • fYear
    2014
  • fDate
    8-12 Sept. 2014
  • Firstpage
    127
  • Lastpage
    136
  • Abstract
    Integrating large scale processing environments, such as Hadoop, with traditional repository systems, such as Fedora Commons 3, have long proved a daunting task. In this paper we show how this integration can be achieved using software developed in the SCAPE project. The SCAPE integration is based on four steps: retrieving the metadata records from the repository, reading the records and their references to data files, updating the records, and storing them back in the repository. This allows full use of the Hadoop system for massively distributed processing without causing excessive load on the repository.
  • Keywords
    distributed processing; meta data; software engineering; Hadoop system; SCAPE project; daunting task; distributed processing; large scale processing environments; meta data records; real world repositories; scalable preservation environments; software development; Connectors; Data models; Educational institutions; Feature extraction; Libraries; Servers; TV; Apache Hadoop; Digital Preservation; Digital Repository; File Characterisation; Integration; JPEG 2000; Preservation Action; Preservation Policies; Scalability;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Digital Libraries (JCDL), 2014 IEEE/ACM Joint Conference on
  • Conference_Location
    London
  • Type

    conf

  • DOI
    10.1109/JCDL.2014.6970158
  • Filename
    6970158