Title :
Beyond Redundancies: A Metric-Invariant Method for Unsupervised Feature Selection
Author :
Hou, Yuexian ; Zhang, Peng ; Yan, Tingxu ; Li, Wenjie ; Song, Dawei
Author_Institution :
Sch. of Comput. Sci. & Technol., Tianjin Univ., Tianjin, China
fDate :
3/1/2010 12:00:00 AM
Abstract :
A fundamental goal of unsupervised feature selection is denoising, which aims to identify and reduce noisy features that are not discriminative. Due to the lack of information about real classes, denoising is a challenging task. The noisy features can disturb the reasonable distance metric and result in unreasonable feature spaces, i.e., the feature spaces in which common clustering algorithms cannot effectively find real classes. To overcome the problem, we make a primary observation that the relevance of features is intrinsic and independent of any metric scaling on the feature space. This observation implies that feature selection should be invariant, at least to some extent, with respect to metric scaling. In this paper, we clarify the necessity of considering the metric invariance in unsupervised feature selection and propose a novel model incorporating metric invariance. Our proposed method is motivated by the following observations: if the statistic that guides the unsupervised feature selection process is invariant with respect to possible metric scaling, the solution of this model will also be invariant. Hence, if a metric-invariant model can distinguish discriminative features from noisy ones in a reasonable feature space, it will also work on the unreasonable counterpart transformed from the reasonable one by metric scaling. A theoretical justification of the metric invariance of our proposed model is given and the empirical evaluation demonstrates its promising performance.
Keywords :
feature extraction; information theory; interference suppression; matrix algebra; pattern clustering; statistical analysis; clustering algorithms; denoising; distance metric; information theory; metric scaling; metric-invariant statistics; unreasonable feature spaces; unsupervised feature selection; Clustering algorithms; Decorrelation; Extraterrestrial measurements; Information theory; Measurement units; Noise level; Noise reduction; Statistics; Unsupervised learning; Vocabulary; Feature evaluation and selection; information theory; metric invariant.;
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
DOI :
10.1109/TKDE.2009.84