Title :
Shrinkage fisher information embedding of high dimensional feature distributions
Author :
Chen, Xu ; Chen, Yilun ; Hero, Alfred
Author_Institution :
Dept. of EECS, Univ. of Michigan, Ann Arbor, MI, USA
Abstract :
In this paper, we introduce a dimensionality reduction method that can be applied to clustering of high dimensional empirical distributions. The proposed approach is based on stabilized information geometrical representation of the feature distributions. The problem of dimensionality reduction on spaces of distribution functions arises in many applications including hyperspectral imaging, document clustering, and classifying flow cytometry data. Our method is a shrinkage regularized version of Fisher information distance, that we call shrinkage FINE (sFINE), which is implemented by Steinian shrinkage estimation of the matrix of Kullback Liebler distances between feature distributions. The proposed method involves computing similarities using shrinkage regularized Fisher information distance between probability density functions (PDFs) of the data features, then applying Laplacian eigenmaps on a derived similarity matrix to accomplish the embedding and perform clustering. The shrinkage regularization controls the trade-off between bias and variance and is especially well-suited for clustering empirical probability distributions of high-dimensional data sets. We also show significant gains in clustering performance on both of the UCI dataset and a spam data set. Finally we demonstrate the superiority of embedding and clustering distributional data using sFINE as compared to other state-of-the-art methods such as non-parametric information clustering, support vector machine (SVM) and sparse K-means.
Keywords :
matrix algebra; pattern clustering; Fisher information distance; Kullback Liebler distances; Laplacian eigenmaps; PDF; Steinian shrinkage estimation; dimensionality reduction; document clustering; flow cytometry data; high dimensional feature distributions; hyperspectral imaging; information geometrical representation; matrix; probability density functions; sFINE; shrinkage FINE; Clustering algorithms; Electronic mail; Estimation; Indexes; Principal component analysis; Servers; Support vector machines;
Conference_Titel :
Signals, Systems and Computers (ASILOMAR), 2011 Conference Record of the Forty Fifth Asilomar Conference on
Conference_Location :
Pacific Grove, CA
Print_ISBN :
978-1-4673-0321-7
DOI :
10.1109/ACSSC.2011.6190349