• DocumentCode
    3437950
  • Title

    Dimensionality, Discriminability, Density and Distance Distributions

  • Author

    Houle, Michael E.

  • Author_Institution
    Nat. Inst. of Inf., Tokyo, Japan
  • fYear
    2013
  • fDate
    7-10 Dec. 2013
  • Firstpage
    468
  • Lastpage
    473
  • Abstract
    For many large-scale applications in data mining, machine learning, and multimedia, fundamental operations such as similarity search, retrieval, classification, clustering, and anomaly detection generally suffer from an effect known as the `curse of dimensionality´. As the dimensionality of the data increases, distance values tend to become less discriminative, due to their increasing relative concentration about the mean of their distribution. For this reason, researchers have considered the analysis of structures and methods in terms of measures of the intrinsic dimensionality of the data sets. This paper is concerned with a generalization of a discrete measure of intrinsic dimensionality, the expansion dimension, to the case of continuous distance distributions. This notion of the intrinsic dimensionality of a distribution is shown to precisely coincide with a natural notion of the indiscriminability of distances and features. Furthermore, for any distance distribution with differentiable cumulative density function, a fundamental relationship is shown to exist between probability density, the cumulative density (cumulative probability divided by distance), intrinsic dimensionality, and discriminability.
  • Keywords
    data mining; information retrieval; learning (artificial intelligence); anomaly detection; continuous distance distributions; data mining; differentiable cumulative density function; dimensionality curse; distance distributions; intrinsic data set dimensionality; large-scale applications; machine learning; multimedia; probability density; similarity search; Density functional theory; Distribution functions; Joints; Random variables; Search problems; Size measurement; Vectors; discriminability; distance distribution; features; intrinsic dimensionality; statistical copula;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining Workshops (ICDMW), 2013 IEEE 13th International Conference on
  • Conference_Location
    Dallas, TX
  • Print_ISBN
    978-1-4799-3143-9
  • Type

    conf

  • DOI
    10.1109/ICDMW.2013.139
  • Filename
    6753958