DocumentCode
739403
Title
Unsupervised Discovery of Subspace Trends
Author
Yan Xu ; Peng Qiu ; Roysam, Badrinath
Author_Institution
Dept. of Electr. & Comput. Eng., Univ. of Houston, Houston, TX, USA
Volume
37
Issue
10
fYear
2015
Firstpage
2131
Lastpage
2145
Abstract
This paper presents unsupervised algorithms for discovering previously unknown subspace trends in high-dimensional data sets without the benefit of prior information. A subspace trend is a sustained pattern of gradual/progressive changes within an unknown subset of feature dimensions. A fundamental challenge to subspace trend discovery is the presence of irrelevant data dimensions, noise, outliers, and confusion from multiple subspace trends driven by independent factors that are mixed in with each other. These factors can obscure the trends in conventional dimension reduction & projection based data visualizations. To overcome these limitations, we propose a novel graph-theoretic neighborhood similarity measure for detecting concordant progressive changes across data dimensions. Using this measure, we present an unsupervised algorithm for trend-relevant feature selection, subspace trend discovery, quantification of trend strength, and validation. Our method successfully identified verifiable subspace trends in diverse synthetic and real-world biomedical datasets. Visualizations derived from the selected trend-relevant features revealed biologically meaningful hidden subspace trend(s) that were obscured by irrelevant features and noise. Although our examples are drawn from the biological domain, the proposed algorithm is broadly applicable to exploratory analysis of high-dimensional data including visualization, hypothesis generation, knowledge discovery, and prediction in diverse other applications.
Keywords
data analysis; feature selection; graph theory; pattern classification; concordant progressive change detection; graph-theoretic neighborhood similarity measure; high-dimensional data sets; subspace trend discovery; trend strength quantification; trend validation; trend-relevant feature selection; trend-relevant features; unsupervised algorithms; Algorithm design and analysis; Clustering algorithms; Data visualization; Erbium; Gene expression; Market research; Noise; Multivariate Data Visualization; Subspace Trend Discovery; Trend-relevant Feature Selection; Trend-relevant feature selection; multivariate data visualization; subspace trend discovery;
fLanguage
English
Journal_Title
Pattern Analysis and Machine Intelligence, IEEE Transactions on
Publisher
ieee
ISSN
0162-8828
Type
jour
DOI
10.1109/TPAMI.2015.2394475
Filename
7015603
Link To Document