Title :
Parallel Information-Theory-Based Construction of Genome-Wide Gene Regulatory Networks
Author :
Zola, Jaroslaw ; Aluru, Maneesha ; Sarje, Abhinav ; Aluru, Srinivas
Author_Institution :
Dept. of Electr. & Comput. Eng., Iowa State Univ., Ames, IA, USA
Abstract :
Constructing genome-wide gene regulatory networks from large-scale gene expression data is an important problem in systems biology. While several techniques have been developed, none of them is parallel, and they do not scale to the whole genome level or incorporate the largest data sets, particularly with rigorous statistical techniques. In this paper, we present a parallel method integrating mutual information, data processing inequality, and statistical testing to detect significant dependencies between genes, and efficiently exploit parallelism inherent in such computations. We present a new method to carry out permutation testing for assessing statistical significance of interactions, while reducing its computational complexity by a factor of Θ(n2), where n is the number of genes. Using both synthetic and known regulatory networks, we show that our method produces networks of quality similar to ARACNe, a widely used mutual-information-based method. We further explore the use of accelerators for gene network construction by presenting a parallelization on a cluster of IBM Cell blades. We exploit parallelization across multiple Cells, multiple cores within each Cell, and vector units within the cores to develop a high-performance implementation that effectively addresses the scaling problem. We report the first inference of a plant whole genome network by constructing a 15,222 gene network of the plant Arabidopsis thaliana from 3,137 microarray experiments in 30 minutes on a 2,048-CPU IBM Blue Gene/L, and in 2 hours and 25 minutes on a 8-node Cell blade cluster.
Keywords :
biology computing; cellular biophysics; genetics; parallel algorithms; statistical testing; Arabidopsis thaliana; IBM Blue Gene/L; IBM cell blades; cell blade cluster; computational complexity; data processing inequality; gene network construction; genome-wide gene regulatory network; large-scale gene expression data; mutual information; parallel information theory; parallel method; permutation testing; statistical technique; statistical testing; systems biology; Bioinformatics; Blades; Data processing; Gene expression; Genetic communication; Genomics; Information theory; Large-scale systems; Mutual information; Systems biology; Parallel algorithms; biology and genetics.;
Journal_Title :
Parallel and Distributed Systems, IEEE Transactions on
DOI :
10.1109/TPDS.2010.59