Title of article :
Sparse cluster analysis of large-scale discrete variables with application to single nucleotide polymorphism data
Author/Authors :
Baolin Wu، نويسنده ,
Issue Information :
روزنامه با شماره پیاپی سال 2013
Abstract :
Currently, extreme large-scale genetic data present significant challenges for cluster analysis. Most of
the existing clustering methods are typically built on the Euclidean distance and geared toward analyzing
continuous response. Theywork well for clustering, e.g. microarray gene expression data, but often perform
poorly for clustering, e.g. large-scale single nucleotide polymorphism (SNP) data. In this paper, we study
the penalized latent class model for clustering extremely large-scale discrete data. The penalized latent
class model takes into account the discrete nature of the response using appropriate generalized linear
models and adopts the lasso penalized likelihood approach for simultaneous model estimation and selection
of important covariates. We develop very efficient numerical algorithms for model estimation based on
the iterative coordinate descent approach and further develop the expectation–maximization algorithm to
incorporate and model missing values. We use simulation studies and applications to the international
HapMap SNP data to illustrate the competitive performance of the penalized latent class model.
Keywords :
Sparse clustering , Clustering , Expectation–maximization algorithm , Latent class model , Lasso , K-means , principalcomponents , Single nucleotide polymorphism
Journal title :
JOURNAL OF APPLIED STATISTICS
Journal title :
JOURNAL OF APPLIED STATISTICS