• DocumentCode
    2131138
  • Title

    Hunting for Coherent Co-clusters in High Dimensional and Noisy Datasets

  • Author

    Deodhar, Meghana ; Ghosh, Joydeep ; Gupta, Gunjan ; Cho, Hyuk ; Dhillon, Inderjit

  • Author_Institution
    Dept. of ECE, Univ. of Texas, Austin, TX
  • fYear
    2008
  • fDate
    15-19 Dec. 2008
  • Firstpage
    654
  • Lastpage
    663
  • Abstract
    Clustering problems often involve datasets where only a part of the data is relevant to the problem, e.g., in microarray data analysis only a subset of the genes show cohesive expressions within a subset of the conditions/features. The existence of a large number of non-informative data points and features makes it challenging to hunt for coherent and meaningful clusters from such datasets. Additionally, since clusters could exist in different subspaces of the feature space, a co-clustering algorithm that simultaneously clusters objects and features is often more suitable as compared to one that is restricted to traditional "one-sided" clustering. We propose Robust Overlapping Co-clustering (ROCC), a scalable and very versatile framework that addresses the problem of efficiently mining dense, arbitrarily positioned, possibly overlapping co-clusters from large, noisy datasets. ROCC has several desirable properties that make it extremely well suited to a number of real life applications. Through extensive experimentation we show that our approach is significantly more accurate in identifying biologically meaningful co-clusters in microarray data as compared to several other prominent approaches that have been applied to this task. We also point out other interesting applications of the proposed framework in solving difficult clustering problems.
  • Keywords
    data analysis; pattern clustering; biologically meaningful coclusters; clustering problems; coclustering algorithm; coherent coclusters; cohesive expressions; feature space; high dimensional datasets; large noisy datasets; microarray data analysis; noninformative data points; one-sided clustering; robust overlapping coclustering; scalable framework; very versatile framework; Clustering algorithms; Conferences; Data analysis; Data mining; Iterative algorithms; Lapping; Optical noise; Robustness; Spatial databases; USA Councils; cluster mining; co-clustering; microarray data;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining Workshops, 2008. ICDMW '08. IEEE International Conference on
  • Conference_Location
    Pisa
  • Print_ISBN
    978-0-7695-3503-6
  • Electronic_ISBN
    978-0-7695-3503-6
  • Type

    conf

  • DOI
    10.1109/ICDMW.2008.20
  • Filename
    4733991