• DocumentCode
    1858061
  • Title

    CRFS: A Lightweight User-Level Filesystem for Generic Checkpoint/Restart

  • Author

    Ouyang, Xiangyong ; Rajachandrasekar, Raghunath ; Besseron, Xavier ; Wang, Hao ; Huang, Jian ; Panda, Dhabaleswar K.

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Ohio State Univ., Columbus, OH, USA
  • fYear
    2011
  • fDate
    13-16 Sept. 2011
  • Firstpage
    375
  • Lastpage
    384
  • Abstract
    Checkpoint/Restart (C/R) mechanisms have been widely adopted by many MPI libraries [1-3] to achieve fault-tolerance. However, a major limitation of such mechanisms is the intensive IO bottleneck caused by the need to dump the snapshots of all processes into persistent storage. Several studies have been conducted to minimize this overhead [4,5], but most of these proposed optimizations are performed inside specific MPI stack or checkpointing library or applications, hence they are not portable enough to be applied to other MPI stacks and applications. In this paper, we propose a filesystem based approach to alleviate this checkpoint IO bottleneck. We propose a new filesystem, named Checkpoint-Restart Filesystem (CRFS), which is a lightweight user-level filesystem based on FUSE (Filesystem in Userspace). CRFS is designed with Checkpoint/Restart I/O traffic in mind to efficiently handle the concurrent write requests. Any software component using standard filesystem interfaces can transparently benefit from CRFS´s capabilities. CRFS intercepts the checkpoint file write system calls and aggregates them into fewer bigger chunks which are asynchronously written to the underlying filesystem for more efficient IO. CRFS manages a flexible internal IO thread pool to throttle concurrent IO to alleviate IO contention for better IO performance. CRFS can be mounted over any standard filesystem like ext3, NFS and Lustre. We have implemented CRFS and evaluated its performance using three popular C/R capable MPI stacks: MVAPICH2, MPICH2 and OpenMPI. Experimental results show significant performance gains for all three MPI stacks. CRFS achieves up to 5.5X speedup in checkpoint writing performance to Lustre filesystem. Similar level of improvements are also obtained with ext3 and NFS filesystems. To the best of our knowledge, this is the first such portable and light-weight filesystem designed for generic Checkpoint/Restart data.
  • Keywords
    application program interfaces; checkpointing; input-output programs; software fault tolerance; Lustre fllesystem; MPI libraries; MPI stack; MPICH2; MVAPICH2; NFS; OpenMPI; checkpoint IO bottleneck; checkpoint-restart I/O traffic; checkpoint-restart filesystem; checkpointing library; concurrent write requests; ext3; fault tolerance; filesystem in userspace; flexible internal IO thread pool; lightweight user level filesystem; optimizations; Checkpointing; Fault tolerance; Fuses; Kernel; Libraries; Optimization; Writing; checkpoint-restart; userspace filesystem; write aggregation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel Processing (ICPP), 2011 International Conference on
  • Conference_Location
    Taipei City
  • ISSN
    0190-3918
  • Print_ISBN
    978-1-4577-1336-1
  • Electronic_ISBN
    0190-3918
  • Type

    conf

  • DOI
    10.1109/ICPP.2011.85
  • Filename
    6047205