• DocumentCode
    3706502
  • Title

    Accelerating I/O Performance of Big Data Analytics on HPC Clusters through RDMA-Based Key-Value Store

  • Author

    Nusrat Sharmin Islam;Dipti Shankar;Xiaoyi Lu;Md. Wasi-Ur-Rahman;Dhabaleswar K. Panda

  • fYear
    2015
  • Firstpage
    280
  • Lastpage
    289
  • Abstract
    Hadoop Distributed File System (HDFS) is the underlying storage engine of many Big Data processing frameworks such as Hadoop MapReduce, HBase, Hive, and Spark. Even though HDFS is well-known for its scalability and reliability, the requirement of large amount of local storage space makes HDFS deployment challenging on HPC clusters. Moreover, HPC clusters usually have large installation of parallel file system like Lustre. In this study, we propose a novel design to integrate HDFS with Lustre through a high performance key-value store. We design a burst buffer system using RDMA-based Mem cached and present three schemes to integrate HDFS with Lustre through this buffer layer, considering different aspects of I/O, data-locality, and fault-tolerance. Our proposed schemes can ensure performance improvement for Big Data applications on HPC clusters. At the same time, they lead to reduced local storage requirement. Performance evaluations show that, our design can improve the write performance of Test DFSIO by up to 2.6x over HDFS and 1.5x over Lustre. The gain in read throughput is up to 8x. Sort execution time is reduced by up to 28% over Lustre and 19% over HDFS. Our design can also significantly benefit I/O-intensive workloads compared to both HDFS and Lustre.
  • Keywords
    "Servers","Big data","Buffer storage","Computer architecture","File systems","Buffer layers","Service-oriented architecture"
  • Publisher
    ieee
  • Conference_Titel
    Parallel Processing (ICPP), 2015 44th International Conference on
  • ISSN
    0190-3918
  • Type

    conf

  • DOI
    10.1109/ICPP.2015.79
  • Filename
    7349583