• DocumentCode
    1784593
  • Title

    PLDSRC: A Multi-threaded Compressor/Decompressor for Massive DNA Sequencing Data

  • Author

    Ke Zhan ; Chao Yang ; Changyou Zhang ; Jingjing Zheng ; Ting Wang

  • Author_Institution
    Inst. of Software, Beijing, China
  • fYear
    2014
  • fDate
    24-27 Nov. 2014
  • Firstpage
    29
  • Lastpage
    33
  • Abstract
    To face the rapid growth of DNA sequencing data, it is of great importance to study high efficiency compression techniques to reduce the cost of storing the massive amount of sequencing data. In this paper, we propose a parallel DNA data compressor/decompress or, PLDSRC, based on the famous serial DSRC software. We first analyze the compression and decompression algorithm in DSRC and identity three basic operations, namely read, work, and write. Then a single pipeline parallel algorithm is proposed to accelerate the compression/decompression procedure. To further exploit today´s popular multi-core, multi-socket systems based on the non-uniform memory access (NUMA) architecture, we extend the single pipeline approach to the multi-pipeline case. Experiments on two different platforms are done and show that PLDSRC in both single and multiple pipeline forms is able to speed up DNA sequencing data compression/decompression greatly, while maintaining the same compressing ratio. Examples indicate that the maximum speedup of PLDSRC on compressing and decompressing is respectively around 24.71x and 22.00x, as compared to the serial DSRC software.
  • Keywords
    DNA; bioinformatics; data compression; multi-threading; multiprocessing systems; parallel algorithms; parallel architectures; NUMA architecture; PLDSRC; compressing ratio; compression algorithm analysis; data storage cost reduction; decompression algorithm analysis; massive DNA sequencing data; multicore multisocket systems; multipipeline approach; multithreaded compressor; multithreaded decompressor; nonuniform memory access architecture; parallel DNA data compressor; parallel DNA data decompressor; pipeline parallel algorithm; read operation; serial DSRC software; single-pipeline approach; work operation; write operation; Bioinformatics; DNA; Instruction sets; Pipelines; Sequential analysis; Software algorithms; DNA sequencing compression; DSRC; Multi-pipeline; NUMA; PLDSRC;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Distributed Computing and Applications to Business, Engineering and Science (DCABES), 2014 13th International Symposium on
  • Conference_Location
    Xian Ning
  • Print_ISBN
    978-1-4799-4170-4
  • Type

    conf

  • DOI
    10.1109/DCABES.2014.9
  • Filename
    6999050