DocumentCode
3592562
Title
Parallel index and query for large scale data analysis
Author
Chou, Jyh-Horng ; Wu, Kaijie ; Rubel, Oliver ; Howison, Mark ; Ji Qiang ; Prabhat ; Austin, Brian ; Bethel, E. Wes ; Ryne, R.D. ; Shoshani, Arie
fYear
2011
Firstpage
1
Lastpage
11
Abstract
Modern scientific datasets present numerous data management and analysis challenges. State-of-the-art index and query technologies are critical for facilitating interactive exploration of large datasets, but numerous challenges remain in terms of designing a system for processing general scientific datasets. The system needs to be able to run on distributed multi-core platforms, efficiently utilize underlying I/O infrastructure, and scale to massive datasets. We present FastQuery, a novel software framework that address these challenges. FastQuery utilizes a state-of-the- art index and query technology (FastBit) and is designed to process massive datasets on modern supercomputing plat- forms. We apply FastQuery to processing of a massive 50TB dataset generated by a large scale accelerator modeling code. We demonstrate the scalability of the tool to 11,520 cores. Motivated by the scientific need to search for interesting particles in this dataset, we use our framework to reduce search time from hours to tens of seconds.
Keywords
data analysis; multiprocessing systems; parallel processing; query processing; FastQuery; I-O infrastructure; accelerator modeling code; data management; distributed multicore platforms; large datasets interactive exploration; large scale data analysis; parallel index; query technologies; scientific datasets; software framework; supercomputing platforms; Arrays; Buildings; Indexing; Parallel processing; Program processors;
fLanguage
English
Publisher
ieee
Conference_Titel
High Performance Computing, Networking, Storage and Analysis (SC), 2011 International Conference for
Electronic_ISBN
978-1-4503-0771-0
Type
conf
Filename
6114446
Link To Document