• DocumentCode
    2028593
  • Title

    Reclaiming space from duplicate files in a serverless distributed file system

  • Author

    Douceur, John R. ; Adya, Atul ; Bolosky, William J. ; Simon, Dan ; Theimer, Marvin

  • fYear
    2002
  • fDate
    2002
  • Firstpage
    617
  • Lastpage
    624
  • Abstract
    The Farsite distributed file system provides availability by replicating each file onto multiple desktop computers. Since this replication consumes significant storage space, it is important to reclaim used space where possible. Measurement of over 500 desktop file systems shows that nearly half of all consumed space is occupied by duplicate files. We present a mechanism to reclaim space from this incidental duplication to make it available for controlled file replication. Our mechanism includes: (1) convergent encryption, which enables duplicate files to be coalesced into the space of a single file, even if the files are encrypted with different users´ keys; and (2) SALAD, a Self-Arranging Lossy Associative Database for aggregating file content and location information in a decentralized, scalable, fault-tolerant manner. Large-scale simulation experiments show that the duplicate-file coalescing system is scalable, highly effective, and fault-tolerant.
  • Keywords
    content-addressable storage; cryptography; network operating systems; replicated databases; self-organising storage; software fault tolerance; storage management; Farsite; SALAD; Self-Arranging Lossy Associative Database; availability; controlled file replication; convergent encryption; decentralized scalable system; desktop file systems; duplicate files; duplicate-file coalescing system; fault-tolerant system; file content aggregation; large-scale simulation; location information; serverless distributed file system; storage space reclamation; Availability; Cryptography; Databases; Distributed computing; Extraterrestrial measurements; Fault diagnosis; File servers; File systems; Large-scale systems; Secure storage;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Distributed Computing Systems, 2002. Proceedings. 22nd International Conference on
  • ISSN
    1063-6927
  • Print_ISBN
    0-7695-1585-1
  • Type

    conf

  • DOI
    10.1109/ICDCS.2002.1022312
  • Filename
    1022312