• DocumentCode
    2528819
  • Title

    stdchk: A Checkpoint Storage System for Desktop Grid Computing

  • Author

    Al-Kiswany, Samer ; Ripeanu, Matei ; Vazhkudai, Sudharshan S. ; Gharaibeh, Abdullah

  • Author_Institution
    Univ. of British Columbia, Vancouver, BC
  • fYear
    2008
  • fDate
    17-20 June 2008
  • Firstpage
    613
  • Lastpage
    624
  • Abstract
    Checkpointing is an indispensable technique to provide fault tolerance for long-running high-throughput applications like those running on desktop grids. This article argues that a checkpoint storage system, optimized to operate in these environments, can offer multiple benefits: reduce the load on a traditional file system, offer high-performance through specialization, and, finally, optimize data management by taking into account checkpoint application semantics. Such a storage system can present a unifying abstraction to checkpoint operations, while hiding the fact that there are no dedicated resources to store the checkpoint data. We prototype stdchk, a checkpoint storage system that uses scavenged disk space from participating desktops to build a low-cost storage system, offering a traditional file system interface for easy integration with applications. This article presents the stdchk architecture, key performance optimizations, and its support for incremental checkpointing and increased data availability. Our evaluation confirms that the stdchk approach is viable in a desktop grid setting and offers a low cost storage system with desirable performance characteristics: high write throughput as well as reduced storage space and network effort to save checkpoint images.
  • Keywords
    checkpointing; grid computing; software fault tolerance; storage management; checkpoint storage system; data management; desktop grid computing; fault tolerance; scavenged disk space; stdchk; traditional file system; Availability; Checkpointing; Costs; Environmental management; Fault tolerance; File systems; Grid computing; Image storage; Optimization; Prototypes; Checkpointing; Desktop Grids; Storage Systems;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Distributed Computing Systems, 2008. ICDCS '08. The 28th International Conference on
  • Conference_Location
    Beijing
  • ISSN
    1063-6927
  • Print_ISBN
    978-0-7695-3172-4
  • Electronic_ISBN
    1063-6927
  • Type

    conf

  • DOI
    10.1109/ICDCS.2008.19
  • Filename
    4595934