• DocumentCode
    611059
  • Title

    PARLO: PArallel Run-Time Layout Optimization for Scientific Data Explorations with Heterogeneous Access Patterns

  • Author

    Zhenhuan Gong ; Boyuka, David A. ; Xiaocheng Zou ; Qing Liu ; Podhorszki, Norbert ; Klasky, Scott ; Xiaosong Ma ; Samatova, N.F.

  • Author_Institution
    North Carolina State Univ., Raleigh, NC, USA
  • fYear
    2013
  • fDate
    13-16 May 2013
  • Firstpage
    343
  • Lastpage
    351
  • Abstract
    The size and scope of cutting-edge scientific simulations are growing much faster than the I/O and storage capabilities of their run-time environments. The growing gap is exacerbated by exploratory, data-intensive analytics, such as querying simulation data with multivariate, spatio-temporal constraints, which induces heterogeneous access patterns that stress the performance of the underlying storage system. Previous work addresses data layout and indexing techniques to improve query performance for a single access pattern, which is not sufficient for complex analytics jobs. We present PARLO a parallel run-time layout optimization framework, to achieve multi-level data layout optimization for scientific applications at run-time before data is written to storage. The layout schemes optimize for heterogeneous access patterns with user-specified priorities. PARLO is integrated with ADIOS, a high-performance parallel I/O middleware for large-scale HPC applications, to achieve user-transparent, light-weight layout optimization for scientific datasets. It offers simple XML-based configuration for users to achieve flexible layout optimization without the need to modify or recompile application codes. Experiments show that PARLO improves performance by 2 to 26 times for queries with heterogeneous access patterns compared to state-of-the-art scientific database management systems. Compared to traditional post-processing approaches, its underlying run-time layout optimization achieves a 56% savings in processing time and a reduction in storage overhead of up to 50%. PARLO also exhibits a low run-time resource requirement, while also limiting the performance impact on running applications to a reasonable level.
  • Keywords
    XML; middleware; optimisation; parallel processing; storage management; ADIOS; PARLO; XML-based configuration; data-intensive analytics; heterogeneous access patterns; high-performance parallel I/O middleware; large-scale HPC application; light-weight layout optimization; multilevel data layout optimization; multivariate constraint; parallel run-time layout optimization; scientific data exploration; spatio-temporal constraint; storage system; Arrays; Indexes; Layout; Middleware; Optimization; Writing; XML;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cluster, Cloud and Grid Computing (CCGrid), 2013 13th IEEE/ACM International Symposium on
  • Conference_Location
    Delft
  • Print_ISBN
    978-1-4673-6465-2
  • Type

    conf

  • DOI
    10.1109/CCGrid.2013.58
  • Filename
    6546111