Title :
Two key properties of dimensionality reduction methods
Author :
Lee, John A. ; Verleysen, Michel
Author_Institution :
IREC Inst., Univ. Catholique de Louvain, Brussels, Belgium
Abstract :
Dimensionality reduction aims at providing faithful low-dimensional representations of high-dimensional data. Its general principle is to attempt to reproduce in a low-dimensional space the salient characteristics of data, such as proximities. A large variety of methods exist in the literature, ranging from principal component analysis to deep neural networks with a bottleneck layer. In this cornucopia, it is rather difficult to find out why a few methods clearly outperform others. This paper identifies two important properties that enable some recent methods like stochastic neighborhood embedding and its variants to produce improved visualizations of high-dimensional data. The first property is a low sensitivity to the phenomenon of distance concentration. The second one is plasticity, that is, the capability to forget about some data characteristics to better reproduce the other ones. In a manifold learning perspective, breaking some proximities typically allow for a better unfolding of data. Theoretical developments as well as experiments support our claim that both properties have a strong impact. In particular, we show that equipping classical methods with the missing properties significantly improves their results.
Keywords :
data reduction; data structures; neural nets; principal component analysis; DR; data representation; deep neural networks; dimensionality reduction; principal component analysis; Cost function; Covariance matrices; Force; Manifolds; Plastics; Principal component analysis; Vectors;
Conference_Titel :
Computational Intelligence and Data Mining (CIDM), 2014 IEEE Symposium on
Conference_Location :
Orlando, FL
DOI :
10.1109/CIDM.2014.7008663