• DocumentCode
    1048146
  • Title

    Diffusion maps and coarse-graining: a unified framework for dimensionality reduction, graph partitioning, and data set parameterization

  • Author

    Lafon, S. ; Lee, A.B.

  • Author_Institution
    Google Inc., Mountain View, CA
  • Volume
    28
  • Issue
    9
  • fYear
    2006
  • Firstpage
    1393
  • Lastpage
    1403
  • Abstract
    We provide evidence that nonlinear dimensionality reduction, clustering, and data set parameterization can be solved within one and the same framework. The main idea is to define a system of coordinates with an explicit metric that reflects the connectivity of a given data set and that is robust to noise. Our construction, which is based on a Markov random walk on the data, offers a general scheme of simultaneously reorganizing and subsampling graphs and arbitrarily shaped data sets in high dimensions using intrinsic geometry. We show that clustering in embedding spaces is equivalent to compressing operators. The objective of data partitioning and clustering is to coarse-grain the random walk on the data while at the same time preserving a diffusion operator for the intrinsic geometry or connectivity of the data set up to some accuracy. We show that the quantization distortion in diffusion space bounds the error of compression of the operator, thus giving a rigorous justification for k-means clustering in diffusion space and a precise measure of the performance of general clustering algorithms
  • Keywords
    Markov processes; geometry; pattern clustering; random processes; Markov random walk; coarse-graining; data set parameterization; diffusion maps; graph partitioning; intrinsic geometry; k-means clustering; nonlinear dimensionality reduction; Clustering algorithms; Distortion measurement; Eigenvalues and eigenfunctions; Extraterrestrial measurements; Geometry; Noise robustness; Noise shaping; Nonlinear distortion; Quantization; Text analysis; Machine learning; Markov processes; clustering; clustering similarity measures; compression (coding); graph algorithms.; graph-theoretic methods; information visualization; knowledge retrieval; quantization; text analysis; Algorithms; Artificial Intelligence; Cluster Analysis; Computer Simulation; Databases, Factual; Information Storage and Retrieval; Models, Statistical; Pattern Recognition, Automated; Signal Processing, Computer-Assisted;
  • fLanguage
    English
  • Journal_Title
    Pattern Analysis and Machine Intelligence, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0162-8828
  • Type

    jour

  • DOI
    10.1109/TPAMI.2006.184
  • Filename
    1661543