Title :
An improved data integration methodology for system biology
Author :
Zhou, Xiaodong ; George, E. Olusegun
Author_Institution :
Dept. of Anatomy & Neurobiol., Univ. of Tennessee Health Sci. Center, Memphis, TN, USA
Abstract :
Pooling P-values from independent experiments has been proven to improve power of statistical tests. Instead of assigning equal weight to each dataset, Hwang et al. proposed a data integration methodology for system biology, labeled Pontillist, to pool data using weighted P-values so as to maximize the number of significant genes discovered. Pontillist uses simulated null distribution of the weighted combination statistics. We have found several fatal statistical errors in Pontillist, and provide a correction to them. Also, Pontillist is intrinsically computationally inefficient requiring substantial, sometimes even prohibitive, computing time for convergence at low significance levels. We propose a new approach for optimal combination of P-values by using the approximated theoretical distribution of the Fisher´s, Logit and Z omnibus combination statistics to estimate the P-value of weighted pooled statistics. Our computationally efficient approach guarantees convergence at any significance level, and produces accurate pooled P-values.
Keywords :
biology computing; data integration; statistical testing; Fisher statistics; Logit statistics; Pontillist; Z omnibus combination statistics; data integration methodology; p-values; simulated null distribution; statistical errors; statistical tests; system biology; weighted combination statistics; Approximation methods; Bioinformatics; Gaussian distribution; Gene expression; Memory management; Optimization; Random variables; Gene Expression; Optimal Weight; Pool P-value;
Conference_Titel :
Bioinformatics and Biomedicine Workshops (BIBMW), 2011 IEEE International Conference on
Conference_Location :
Atlanta, GA
Print_ISBN :
978-1-4577-1612-6
DOI :
10.1109/BIBMW.2011.6112380