• DocumentCode
    67622
  • Title

    Adaptive Noisy Clustering

  • Author

    Chichignoud, Michael ; Loustau, Sebastien

  • Author_Institution
    ETH Zurich, Zurich, Switzerland
  • Volume
    60
  • Issue
    11
  • fYear
    2014
  • fDate
    Nov. 2014
  • Firstpage
    7279
  • Lastpage
    7292
  • Abstract
    The problem of adaptive noisy clustering is investigated. Given a set of noisy observations Zi = Xi + εi, i = 1,⋯,n, the goal is to design clusters associated with the law of Xi´s, with unknown density f with respect to the Lebesgue measure. Since we observe a corrupted sample, a direct approach as the popular k-means is not suitable in this case. In this paper, we propose a noisy k-means minimization, which is based on the k-means loss function and a deconvolution estimator of the density f. In particular, this approach suffers from the dependence on a bandwidth involved in the deconvolution kernel. Fast rates of convergence for the excess risk are proposed for a particular choice of the bandwidth, which depends on the smoothness of the density f. Then, we turn out into the main issue of this paper: the data-driven choice of the bandwidth. We state an adaptive upper bound using a modified version of Lespki´s method, called Empirical Risk Comparison, where empirical risks associated with different bandwidths are compared. Eventually, we illustrate that the selection rule can be used in many statistical problems of M-estimation where the empirical risk depends on a nuisance parameter.
  • Keywords
    adaptive estimation; deconvolution; minimisation; nonparametric statistics; pattern clustering; Lebesgue measure; Lespki method; M-estimation; adaptive noisy clustering; adaptive upper bound; data-driven choice; deconvolution estimator; deconvolution kernel; empirical risk comparison; excess risk; k-means loss function; noisy k-means minimization; noisy observations; selection rule; statistical problems; Bandwidth; Convergence; Deconvolution; Estimation; Kernel; Noise measurement; Standards; Adaptivity; M-estimation; deconvolution; errors-in-variables; fast rates; statistical learning;
  • fLanguage
    English
  • Journal_Title
    Information Theory, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0018-9448
  • Type

    jour

  • DOI
    10.1109/TIT.2014.2356577
  • Filename
    6898023