DocumentCode :
1521551
Title :
Hub Discovery in Partial Correlation Graphs
Author :
Hero, Alfred ; Rajaratnam, Bala
Author_Institution :
Departments of EECS, BME and Statistics, University of Michigan, Ann Arbor, U.S.A.
Volume :
58
Issue :
9
fYear :
2012
Firstpage :
6064
Lastpage :
6078
Abstract :
One of the most important problems in large-scale inference problems is the identification of variables that are highly dependent on several other variables. When dependence is measured by partial correlations, these variables identify those rows of the partial correlation matrix that have several entries with large magnitudes, i.e., hubs in the associated partial correlation graph. This paper develops theory and algorithms for discovering such hubs from a few observations of these variables. We introduce a hub screening framework in which the user specifies both a minimum (partial) correlation \\rho and a minimum degree \\delta to screen the vertices. The choice of \\rho and \\delta can be guided by our mathematical expressions for the phase transition correlation threshold \\rho _{c} governing the average number of discoveries. They can also be guided by our asymptotic expressions for familywise discovery rates under the assumption of large number p of variables, fixed number n of multivariate samples, and weak dependence. Under the null hypothesis that the dispersion (covariance) matrix is sparse, these limiting expressions can be used to enforce familywise error constraints and to rank the discoveries in order of increasing statistical significance. For n\\ll p , the computational complexity of the proposed partial correlation screening method is low and is therefore highly scalable. Thus, it can be applied to significantly large- problems than previous approaches. The theory is applied to discovering hubs in a high-dimensional gene microarray dataset.
Keywords :
Approximation methods; Artificial neural networks; Correlation; Covariance matrix; Dispersion; Matrix converters; Sparse matrices; ${p}$-value trajectories; Asymptotic Poisson limits; Gaussian graphical models (GGMs); correlation networks; discovery rate phase transitions; nearest neighbor dependence; node degree and connectivity;
fLanguage :
English
Journal_Title :
Information Theory, IEEE Transactions on
Publisher :
ieee
ISSN :
0018-9448
Type :
jour
DOI :
10.1109/TIT.2012.2200825
Filename :
6203585
Link To Document :
بازگشت