• DocumentCode
    2208321
  • Title

    Discovering Correlated Subspace Clusters in 3D Continuous-Valued Data

  • Author

    Sim, Kelvin ; Aung, Zeyar ; Gopalkrishnan, Vivekanand

  • Author_Institution
    Inst. for Infocomm Res., A*STAR, Singapore, Singapore
  • fYear
    2010
  • fDate
    13-17 Dec. 2010
  • Firstpage
    471
  • Lastpage
    480
  • Abstract
    Subspace clusters represent useful information in high-dimensional data. However, mining significant subspace clusters in continuous-valued 3D data such as stock-financial ratio-year data, or gene-sample-time data, is difficult. Firstly, typical metrics either find subspaces with very few objects, or they find too many insignificant subspaces - those which exist by chance. Besides, typical 3D subspace clustering approaches abound with parameters, which are usually set under biased assumptions, making the mining process a `guessing game´. We address these concerns by proposing an information theoretic measure, which allows us to identify 3D subspace clusters that stand out from the data. We also develop a highly effective, efficient and parameter-robust algorithm, which is a hybrid of information theoretical and statistical techniques, to mine these clusters. From extensive experimentations, we show that our approach can discover significant 3D subspace clusters embedded in 110 synthetic datasets of varying conditions. We also perform a case study on real-world stock datasets, which shows that our clusters can generate higher profits compared to those mined by other approaches.
  • Keywords
    data analysis; data mining; pattern clustering; 3D subspace clustering approache; continuous valued 3D data; correlated subspace cluster; information hybrid; mining process; parameter robust algorithm; statistical technique; stock dataset; 3D subspace clustering; financial data mining; information theory;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining (ICDM), 2010 IEEE 10th International Conference on
  • Conference_Location
    Sydney, NSW
  • ISSN
    1550-4786
  • Print_ISBN
    978-1-4244-9131-5
  • Electronic_ISBN
    1550-4786
  • Type

    conf

  • DOI
    10.1109/ICDM.2010.19
  • Filename
    5694001