DocumentCode :
1639392
Title :
Clustered Workflow Execution of Retargeted Data Analysis Scripts
Author :
Wang, Daniel L. ; Zender, Charles S. ; Jenks, Stephen F.
Author_Institution :
Univ. of California at Davis, Irvine, CA
fYear :
2008
Firstpage :
449
Lastpage :
458
Abstract :
Supercomputing advances have enabled computational science data volumes to grow at ever increasing rates, commonly resulting in more data produced than can be practically analyzed. Whole-dataset download costs have grown to impractical heights, even with multi-Gbps networks, forcing scientists to rely on server-side subsetting and limiting the scope of data they can analyze on a workstation. Our system supplements existing scientific data services with lightweight computational capability, providing a means of safely relocating analysis from the desktop to the server where clustered execution can be coordinated, exploiting data locality, reducing unnecessary data transfer, and providing end-users with results several times faster. We show how dataflow and other compiler-inspired analyses of shell scripts of scientists´ most common analysis tools enables parallelization and optimizations in disk and network I/O bandwidth. We benchmark using an actual geo-science analysis script, illustrating the crucial performance gains of extracting workflows defined in scripts and optimizing their execution. Current results quantify significant improvements in performance, showing the promise of bringing transparent high-performance analysis to the scientist´s desktop.
Keywords :
data analysis; pattern clustering; scientific information systems; workflow management software; clustered workflow execution; compiler-inspired analyses; computational science data volume; geo-science analysis script; retargeted data analysis script; safe relocating analysis; scientific data service; scientist shell script; supercomputing; Bandwidth; Costs; Data analysis; Distributed computing; Grid computing; Network servers; Optimizing compilers; Performance analysis; USA Councils; Workstations; cluster; compilation; data analysis; parallelism; scientific computing; scripting; service;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Cluster Computing and the Grid, 2008. CCGRID '08. 8th IEEE International Symposium on
Conference_Location :
Lyon
Print_ISBN :
978-0-7695-3156-4
Electronic_ISBN :
978-0-7695-3156-4
Type :
conf
DOI :
10.1109/CCGRID.2008.69
Filename :
4534249
Link To Document :
بازگشت