• DocumentCode
    739403
  • Title

    Unsupervised Discovery of Subspace Trends

  • Author

    Yan Xu ; Peng Qiu ; Roysam, Badrinath

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Univ. of Houston, Houston, TX, USA
  • Volume
    37
  • Issue
    10
  • fYear
    2015
  • Firstpage
    2131
  • Lastpage
    2145
  • Abstract
    This paper presents unsupervised algorithms for discovering previously unknown subspace trends in high-dimensional data sets without the benefit of prior information. A subspace trend is a sustained pattern of gradual/progressive changes within an unknown subset of feature dimensions. A fundamental challenge to subspace trend discovery is the presence of irrelevant data dimensions, noise, outliers, and confusion from multiple subspace trends driven by independent factors that are mixed in with each other. These factors can obscure the trends in conventional dimension reduction & projection based data visualizations. To overcome these limitations, we propose a novel graph-theoretic neighborhood similarity measure for detecting concordant progressive changes across data dimensions. Using this measure, we present an unsupervised algorithm for trend-relevant feature selection, subspace trend discovery, quantification of trend strength, and validation. Our method successfully identified verifiable subspace trends in diverse synthetic and real-world biomedical datasets. Visualizations derived from the selected trend-relevant features revealed biologically meaningful hidden subspace trend(s) that were obscured by irrelevant features and noise. Although our examples are drawn from the biological domain, the proposed algorithm is broadly applicable to exploratory analysis of high-dimensional data including visualization, hypothesis generation, knowledge discovery, and prediction in diverse other applications.
  • Keywords
    data analysis; feature selection; graph theory; pattern classification; concordant progressive change detection; graph-theoretic neighborhood similarity measure; high-dimensional data sets; subspace trend discovery; trend strength quantification; trend validation; trend-relevant feature selection; trend-relevant features; unsupervised algorithms; Algorithm design and analysis; Clustering algorithms; Data visualization; Erbium; Gene expression; Market research; Noise; Multivariate Data Visualization; Subspace Trend Discovery; Trend-relevant Feature Selection; Trend-relevant feature selection; multivariate data visualization; subspace trend discovery;
  • fLanguage
    English
  • Journal_Title
    Pattern Analysis and Machine Intelligence, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0162-8828
  • Type

    jour

  • DOI
    10.1109/TPAMI.2015.2394475
  • Filename
    7015603