• Title of article

    Sparse cluster analysis of large-scale discrete variables with application to single nucleotide polymorphism data

  • Author/Authors

    Baolin Wu، نويسنده ,

  • Issue Information
    روزنامه با شماره پیاپی سال 2013
  • Pages
    10
  • From page
    358
  • To page
    367
  • Abstract
    Currently, extreme large-scale genetic data present significant challenges for cluster analysis. Most of the existing clustering methods are typically built on the Euclidean distance and geared toward analyzing continuous response. Theywork well for clustering, e.g. microarray gene expression data, but often perform poorly for clustering, e.g. large-scale single nucleotide polymorphism (SNP) data. In this paper, we study the penalized latent class model for clustering extremely large-scale discrete data. The penalized latent class model takes into account the discrete nature of the response using appropriate generalized linear models and adopts the lasso penalized likelihood approach for simultaneous model estimation and selection of important covariates. We develop very efficient numerical algorithms for model estimation based on the iterative coordinate descent approach and further develop the expectation–maximization algorithm to incorporate and model missing values. We use simulation studies and applications to the international HapMap SNP data to illustrate the competitive performance of the penalized latent class model.
  • Keywords
    Sparse clustering , Clustering , Expectation–maximization algorithm , Latent class model , Lasso , K-means , principalcomponents , Single nucleotide polymorphism
  • Journal title
    JOURNAL OF APPLIED STATISTICS
  • Serial Year
    2013
  • Journal title
    JOURNAL OF APPLIED STATISTICS
  • Record number

    712917