• Title of article

    A statistical view of clustering performance through the theory of -processes

  • Author/Authors

    Clémençon، نويسنده , , Stéphan، نويسنده ,

  • Issue Information
    دوفصلنامه با شماره پیاپی سال 2014
  • Pages
    15
  • From page
    42
  • To page
    56
  • Abstract
    Many clustering techniques aim at optimizing empirical criteria that are of the form of a U -statistic of degree two. Given a measure of dissimilarity between pairs of observations, the goal is to minimize the within cluster point scatter over a class of partitions of the feature space. It is the purpose of this paper to define a general statistical framework, relying on the theory of U -processes, for studying the performance of such clustering methods. In this setup, under adequate assumptions on the complexity of the subsets forming the partition candidates, the excess of clustering risk of the empirical minimizer is proved to be of the order O P ( 1 / n ) . A lower bound result shows that the rate obtained is optimal in a minimax sense. Based on recent results related to the tail behavior of degenerate U -processes, it is also shown how to establish tighter, and even faster, rate bounds under additional assumptions. Model selection issues, related to the number of clusters forming the data partition in particular, are also considered. Finally, it is explained how the theoretical results developed here can provide statistical guarantees for empirical clustering aggregation.
  • Keywords
    U -process , Empirical risk minimization , Fast rates , Minimax lower bound , Median clustering , Cluster analysis , Pairwise dissimilarity
  • Journal title
    Journal of Multivariate Analysis
  • Serial Year
    2014
  • Journal title
    Journal of Multivariate Analysis
  • Record number

    1566560