DocumentCode :
1920606
Title :
MLOC: Multi-level Layout Optimization Framework for Compressed Scientific Data Exploration with Heterogeneous Access Patterns
Author :
Gong, Zhenhuan ; Rogers, Terry ; Jenkins, John ; Kolla, Hemanth ; Ethier, Stephane ; Chen, Jackie ; Ross, Robert ; Klasky, Scott ; Samatova, Nagiza F.
Author_Institution :
North Carolina State Univ., Raleigh, NC, USA
fYear :
2012
fDate :
10-13 Sept. 2012
Firstpage :
239
Lastpage :
248
Abstract :
The size and scope of cutting-edge scientific simulations are growing much faster than the I/O and storage capabilities of their runtime environments. The growing gap gets exacerbated by exploratory dataâ"intensive analytics, such as querying simulation data for regions of interest with multivariate, spatio-temporal constraints. Query-driven data exploration induces heterogeneous access patterns that further stress the performance of the underlying storage system. To partially alleviate the problem, data reduction via compression and multi-resolution data extraction are becoming an integral part of I/O systems. While addressing the data size issue, these techniques introduce yet another mix of access patterns to a heterogeneous set of possibilities. Moreover, how extreme-scale datasets are partitioned into multiple files and organized on a parallel file systems augments to an already combinatorial space of possible access patterns. To address this challenge, we present MLOC, a parallel Multilevel Layout Optimization framework for Compressed scientific spatio-temporal data at extreme scale. MLOC proposes multiple fine-grained data layout optimization kernels that form a generic core from which a broader constellation of such kernels can be organically consolidated to enable an effective data exploration with various combinations of access patterns. Specifically, the kernels are optimized for access patterns induced by (a) queryâ"driven multivariate, spatio-temporal constraints, (b) precisionâ"driven data analytics, (c) compressionâ"driven data reduction, (d) multi-resolution data sampling, and (e) multiâ"file data partitioning and organization on a parallel file system. MLOC organizes these optimization kernels within a multiâ"level architecture, on which all the levels can be flexibly re-ordered by userâ"defined priorities. When tested on queryâ"driven exploration of compressed data, MLOC demon- trates a superior performance compared to any state-of-the-art scientific database management technologies.
Keywords :
data compression; data reduction; distributed databases; natural sciences computing; network operating systems; optimisation; query processing; sampling methods; spatiotemporal phenomena; storage management; I/O systems; MLOC; combinatorial access pattern space; compressed scientific data exploration; compressed scientific spatiotemporal data; compression-driven data reduction; data size issue; database management technologies; exploratory data-intensive analytics; extreme-scale datasets; heterogeneous access patterns; multifile data partitioning; multilevel architecture; multilevel layout optimization framework; multiple fine-grained data layout optimization kernels; multiresolution data extraction; multiresolution data sampling; parallel file systems; precision-driven data analytics; query driven multivariate spatiotemporal constraints; query-driven data exploration; runtime environments; scientific simulations; storage capability; Data compression; Data models; Kernel; Layout; Optimization; Organizations; Throughput;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel Processing (ICPP), 2012 41st International Conference on
Conference_Location :
Pittsburgh, PA
ISSN :
0190-3918
Print_ISBN :
978-1-4673-2508-0
Type :
conf
DOI :
10.1109/ICPP.2012.39
Filename :
6337585
Link To Document :
بازگشت