Title :
Parallelization of Bayesian network based SNPs pattern analysis and performance characterization on SMP/HT
Author :
Song, Justin ; Li, Eric ; Hu, Wei ; Ge, Steven ; Lai, Chunrong ; Zhang, Yimin ; Zhang, Xuegong ; Chen, Wenguang ; Zheng, Weimin
Author_Institution :
Corporate Technol. Group, Intel Corp., Santa Clara, CA, USA
Abstract :
Single nucleotide polymorphisms (SNPs) is subtle variation in a genomic DNA sequence of individuals of the same species. It plays a key role in the pharmaceutical industry to understand variations in drug treatment responses between individuals at the molecular level. Discovering patterns around SNPs loci is very important for better understanding the possible origin of SNPs in evolution. Bayesian network has been applied to this problem and got promising results. Since Bayesian network based SNPs pattern analysis demonstrates high computational complexity, we parallelized this workload on Intel Xeon SMP systems. SNPs´ task level parallelism is exploited. Experiment results show that memory is bottleneck: on 8-way Xeon SMP hyper-threading enabled system, system memory bandwidth is fully saturated and memory load access latency is roughly 50% longer than on single processor system. Another interesting result is that Intel´s hyper-threading technology helps improve the multithreaded workload´s performance by 1.6X speedup. Workload profiling shows that parallel SNPs´ data sharing nature matches hyper-threading´s cache sharing mechanism, and thus greatly reduces cache coherency protocol traffic on shared front side bus. Scalability analysis shows that imbalance and locks are two major factors that may limit the parallel workload speedup on more processor platforms.
Keywords :
DNA; belief networks; biology computing; cache storage; computational complexity; data mining; genetics; multi-threading; pattern recognition; Bayesian network; Intel Xeon SMP systems; SMP/HT; SNP pattern analysis; Xeon SMP hyper-threading enabled system; cache coherency protocol traffic; computational complexity; data sharing; drug treatment; genomic DNA sequence; hyper-threading cache sharing; hyper-threading technology; memory load access latency; multithreaded workload performance; network parallelization; parallel SNP; parallel workload; pattern discovery; pharmaceutical industry; processor platform; scalability analysis; shared front side bus; single nucleotide polymorphisms; single processor system; task level parallelism; workload profiling; ystem memory bandwidth; Bayesian methods; Bioinformatics; Computational complexity; DNA; Drugs; Genomics; Parallel processing; Pattern analysis; Pharmaceuticals; Sequences;
Conference_Titel :
Parallel and Distributed Systems, 2004. ICPADS 2004. Proceedings. Tenth International Conference on
Print_ISBN :
0-7695-2152-5
DOI :
10.1109/ICPADS.2004.1316110