• DocumentCode
    1047830
  • Title

    On the Performance of Clustering in Hilbert Spaces

  • Author

    Biau, Gérard ; Devroye, Luc ; Lugosi, Gábor

  • Author_Institution
    LSTA & LPMA, Univ. Pierre et Marie Curie-Paris VI, Paris, France
  • Volume
    54
  • Issue
    2
  • fYear
    2008
  • Firstpage
    781
  • Lastpage
    790
  • Abstract
    Based on randomly drawn vectors in a separable Hilbert space, one may construct a k-means clustering scheme by minimizing an empirical squared error. We investigate the risk of such a clustering scheme, defined as the expected squared distance of a random vector X from the set of cluster centers. Our main result states that, for an almost surely bounded , the expected excess clustering risk is O(¿1/n) . Since clustering in high (or even infinite)-dimensional spaces may lead to severe computational problems, we examine the properties of a dimension reduction strategy for clustering based on Johnson-Lindenstrauss-type random projections. Our results reflect a tradeoff between accuracy and computational complexity when one uses k-means clustering after random projection of the data to a low-dimensional space. We argue that random projections work better than other simplistic dimension reduction schemes.
  • Keywords
    Hilbert spaces; errors; minimisation; vector quantisation; Hilbert spaces; Johnson-Lindenstrauss-type random projections; dimension reduction strategy; empirical squared error minimization; expected squared distance; k-means clustering scheme; random vector; Biology; Computational complexity; Computer science; Data analysis; Data compression; Hilbert space; Kernel; Risk management; Unsupervised learning; Vector quantization; $k$-means; Clustering; Hilbert space; empirical risk minimization; random projections; vector quantization;
  • fLanguage
    English
  • Journal_Title
    Information Theory, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0018-9448
  • Type

    jour

  • DOI
    10.1109/TIT.2007.913516
  • Filename
    4439834