Title :
A Generalized Multivariate Approach to Pattern Discovery from Replicated and Incomplete Genome-Wide Measurements
Author :
Zhu, Dongxiao ; Acharya, Lipi ; Zhang, Hui
Author_Institution :
Dept. of Comput. Sci., Univ. of New Orleans, New Orleans, LA, USA
Abstract :
Estimation of pairwise correlation from incomplete and replicated molecular profiling data is an ubiquitous problem in pattern discovery analysis, such as clustering and networking. However, existing methods solve this problem by ad hoc data imputation, followed by aveGation coefficient type approaches, which might annihilate important patterns present in the molecular profiling data. Moreover, these approaches do not consider and exploit the underlying experimental design information that specifies the replication mechanisms. We develop an Expectation-Maximization (EM) type algorithm to estimate the correlation structure using incomplete and replicated molecular profiling data with a priori known replication mechanism. The approach is sufficiently generalized to be applicable to any known replication mechanism. In case of unknown replication mechanism, it is reduced to the parsimonious model introduced previously. The efficacy of our approach was first evaluated by comprehensively comparing various bivariate and multivariate imputation approaches using simulation studies. Results from real-world data analysis further confirmed the superior performance of the proposed approach to the commonly used approaches, where we assessed the robustness of the method using data sets with up to 30 percent missing values.
Keywords :
bioinformatics; correlation theory; estimation theory; expectation-maximisation algorithm; genomics; molecular biophysics; pattern clustering; a priori known replication mechanism; ad hoc data imputation; aveGation coefficient type approach; bivariate imputation; clustering; expectation-maximization algorithm; generalized multivariate approach; incomplete genome wide measurements; multivariate imputation; networking; pairwise correlation estimation; pattern discovery; real world data analysis; replicated molecular profiling data; Bioinformatics; Biological system modeling; Computational biology; Computational modeling; Correlation; Indexes; Pairwise error probability; Replicated data; missing value.; pairwise correlation; pattern recognition; unsupervised learning; Algorithms; Artificial Intelligence; Cluster Analysis; Computer Simulation; Databases, Genetic; Gene Expression Profiling; Genes, Fungal; Genomics; Multivariate Analysis; Oligonucleotide Array Sequence Analysis; Pattern Recognition, Automated; Reproducibility of Results;
Journal_Title :
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
DOI :
10.1109/TCBB.2010.102