• DocumentCode
    1924970
  • Title

    Improving I/O Throughput with PRIMACY: Preconditioning ID-Mapper for Compressing Incompressibility

  • Author

    Shah, Neil ; Schendel, Eric R. ; Lakshminarasimhan, Sriram ; Pendse, Saurabh V. ; Rogers, Terry ; Samatova, Nagiza F.

  • Author_Institution
    Dept. of Comput. Sci., North Carolina State Univ., Raleigh, NC, USA
  • fYear
    2012
  • fDate
    24-28 Sept. 2012
  • Firstpage
    209
  • Lastpage
    219
  • Abstract
    The ability to efficiently handle massive amounts of data is necessary for the continuing development towards exascale scientific data-mining applications and database systems. Unfortunately, recent years have shown a growing gap between the size and complexity of data produced from scientific applications and the limited I/O bandwidth available on modern high-performance computing systems. Utilizing data compression in order to lower the degree of I/O activity offers a promising means to addressing this problem. However, the standard compression algorithms previously explored for such use offer limited gains on both the end-to-end throughput and storage fronts. In this paper, we introduce an in-situ compression scheme aimed at improving end-to-end I/O throughput as well as reduction of dataset size. Our technique, PRIMACY (Preconditioning Id-MApper for Compressing incompressibility), acts as a preconditioner for standard compression libraries by modifying representation of original floating-point scientific data to increase byte-level repeatability, allowing standard loss less compressors to take advantage of their entropy-based byte-level encoding schemes. We additionally present a theoretical model for compression efficiency in high-performance computing environments and evaluate the efficiency of our approach via comparative analysis. Based on our evaluations on 20 real-world scientific datasets, PRIMACY achieved up to 38% and 22% improvements upon standard end-to-end write and read throughputs respectively in addition to a 25% increase in compression ratios paired with 3-to-4-fold improvement in both compression and decompression throughput over general purpose compressors.
  • Keywords
    data compression; data mining; data reduction; data structures; entropy; storage management; I/O activity; I/O bandwidth; PRIMACY; Preconditioning ID-MApper for Compressing incompressibility; byte-level repeatability; comparative analysis; compression algorithm; compression efficiency; compression library; data complexity; data compression; data handling; data size; database system; dataset size reduction; decompression; end-to-end I/O throughput; end-to-end read throughput; end-to-end write throughput; entropy-based byte-level encoding scheme; exascale scientific data-mining application; floating-point scientific data representation; high-performance computing environment; high-performance computing system; in-situ compression scheme; loss less compressor; preconditioning ID-mapper; scientific application; storage front; Bandwidth; Compressors; Data models; Encoding; Pipelines; Standards; Throughput; I/O; Lossless Compression; Performance Modeling;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cluster Computing (CLUSTER), 2012 IEEE International Conference on
  • Conference_Location
    Beijing
  • Print_ISBN
    978-1-4673-2422-9
  • Type

    conf

  • DOI
    10.1109/CLUSTER.2012.16
  • Filename
    6337782