DocumentCode
1783227
Title
Enabling In-Situ Data Analysis for Large Protein-Folding Trajectory Datasets
Author
Boyu Zhang ; Estrada, Trilce ; Cicotti, Pietro ; Taufer, Michela
fYear
2014
fDate
19-23 May 2014
Firstpage
221
Lastpage
230
Abstract
This paper presents a one-pass, distributed method that enables in-situ data analysis for large protein folding trajectory datasets by executing sufficiently fast, avoiding moving trajectory data, and limiting the memory usage. First, the method extracts the geometric shape features of each protein conformation in parallel. Then, it classifies sets of consecutive conformations into meta-stable and transition stages using a probabilistic hierarchical clustering method. Lastly, it rebuilds the global knowledge necessary for the intraand inter-trajectory analysis through a reduction operation. The comparison of our method with a traditional approach for a villin headpiece sub domain shows that our method generates significant improvements in execution time, memory usage, and data movement. Specifically, to analyze the same trajectory consisting of 20,000 protein conformations, our method runs in 41.5 seconds while the traditional approach takes approximately 3 hours, uses 6.9MB memory per core while the traditional method uses 16GB on one single node where the analysis is performed, and communicates only 4.4KB while the traditional method moves the entire dataset of 539MB. The overall results in this paper support our claim that our method is suitable for in-situ data analysis of folding trajectories.
Keywords
bioinformatics; data analysis; distributed processing; pattern clustering; proteins; distributed method; geometric shape features; global knowledge necessary; in-situ data analysis; intertrajectory analysis; intratrajectory analysis; large protein-folding trajectory datasets; memory usage; probabilistic hierarchical clustering method; protein conformation; villin headpiece subdomain; Correlation; Crystals; Data analysis; Data mining; Feature extraction; Proteins; Trajectory;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel and Distributed Processing Symposium, 2014 IEEE 28th International
Conference_Location
Phoenix, AZ
ISSN
1530-2075
Print_ISBN
978-1-4799-3799-8
Type
conf
DOI
10.1109/IPDPS.2014.33
Filename
6877257
Link To Document