• DocumentCode
    569064
  • Title

    On the Use of GPUs in Realizing Cost-Effective Distributed RAID

  • Author

    Khasymski, Aleksandr ; Rafique, M. Mustafa ; Butt, Ali R. ; Vazhkudai, Sudharshan S. ; Nikolopoulos, Dimitrios S.

  • Author_Institution
    Virginia Tech, Blacksburg, VA, USA
  • fYear
    2012
  • fDate
    7-9 Aug. 2012
  • Firstpage
    469
  • Lastpage
    478
  • Abstract
    The exponential growth in user and application data entails new means for providing fault tolerance and protection against data loss. High Performance Computing (HPC) storage systems, which are at the forefront of handling the data deluge, typically employ hardware RAID at the backend. However, such solutions are costly, do not ensure end-to-end data integrity, and can become a bottleneck during data reconstruction. In this paper, we design an innovative solution to achieve a flexible, fault-tolerant, and high-performance RAID-6 solution for a parallel file system (PFS). Our system utilizes low-cost, strategically placed GPUs - both on the client and server sides - to accelerate parity computation. In contrast to hardware-based approaches, we provide full control over the size, length and location of a RAID array on a per file basis, end-to-end data integrity checking, and parallelization of RAID array reconstruction. We have deployed our system in conjunction with the widely-used Lustre PFS, and show that our approach is feasible and imposes acceptable overhead.
  • Keywords
    RAID; client-server systems; data integrity; file organisation; parallel processing; software fault tolerance; GPU; HPC storage systems; PFS; RAID array length; RAID array location; RAID array reconstruction parallelization; RAID array size; application data; client sides; cost-effective distributed RAID; data deluge handling; data loss protection; data reconstruction; end-to-end data integrity checking; fault tolerance; high-performance computing storage systems; parallel file system; parity computation; redundant arrays of inexpensive disks; server sides; Acceleration; Arrays; Encoding; Graphics processing unit; Hardware; Kernel;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Modeling, Analysis & Simulation of Computer and Telecommunication Systems (MASCOTS), 2012 IEEE 20th International Symposium on
  • Conference_Location
    Washington, DC
  • ISSN
    1526-7539
  • Print_ISBN
    978-1-4673-2453-3
  • Type

    conf

  • DOI
    10.1109/MASCOTS.2012.59
  • Filename
    6298207