• DocumentCode
    1783312
  • Title

    Pipelined Compaction for the LSM-Tree

  • Author

    Zigang Zhang ; Yinliang Yue ; Bingsheng He ; Jin Xiong ; Mingyu Chen ; Lixin Zhang ; Ninghui Sun

  • Author_Institution
    SKL Comput. Archit., ICT, China
  • fYear
    2014
  • fDate
    19-23 May 2014
  • Firstpage
    777
  • Lastpage
    786
  • Abstract
    Write-optimized data structures like Log-Structured Merge-tree (LSM-tree) and its variants are widely used in key-value storage systems like Big Table and Cassandra. Due to deferral and batching, the LSM-tree based storage systems need background compactions to merge key-value entries and keep them sorted for future queries and scans. Background compactions play a key role on the performance of the LSM-tree based storage systems. Existing studies about the background compaction focus on decreasing the compaction frequency, reducing I/Os or confining compactions on hot data key-ranges. They do not pay much attention to the computation time in background compactions. However, the computation time is no longer negligible, and even the computation takes more than 60% of the total compaction time in storage systems using flash based SSDs. Therefore, an alternative method to speedup the compaction is to make good use of the parallelism of underlying hardware including CPUs and I/O devices. In this paper, we analyze the compaction procedure, recognize the performance bottleneck, and propose the Pipelined Compaction Procedure (PCP) to better utilize the parallelism of CPUs and I/O devices. Theoretical analysis proves that PCP can improve the compaction bandwidth. Furthermore, we implement PCP in real system and conduct extensive experiments. The experimental results show that the pipelined compaction procedure can increase the compaction bandwidth and storage system throughput by 77% and 62% respectively.
  • Keywords
    merging; parallel processing; performance evaluation; pipeline processing; storage management; tree data structures; CPU; I/O devices; LSM-tree based storage systems; PCP; background compactions; compaction bandwidth improvement; compaction frequency reduction; computation time; key-value entries; key-value storage systems; log-structured merge-tree; performance bottleneck; pipelined compaction procedure; storage system throughput; write-optimized data structures; Bandwidth; Compaction; Data structures; Hardware; Indexes; Parallel processing; Pipelines; LSM-tree; compaction; pipeline; storage system;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel and Distributed Processing Symposium, 2014 IEEE 28th International
  • Conference_Location
    Phoenix, AZ
  • ISSN
    1530-2075
  • Print_ISBN
    978-1-4799-3799-8
  • Type

    conf

  • DOI
    10.1109/IPDPS.2014.85
  • Filename
    6877309