Title :
Hunting for Coherent Co-clusters in High Dimensional and Noisy Datasets
Author :
Deodhar, Meghana ; Ghosh, Joydeep ; Gupta, Gunjan ; Cho, Hyuk ; Dhillon, Inderjit
Author_Institution :
Dept. of ECE, Univ. of Texas, Austin, TX
Abstract :
Clustering problems often involve datasets where only a part of the data is relevant to the problem, e.g., in microarray data analysis only a subset of the genes show cohesive expressions within a subset of the conditions/features. The existence of a large number of non-informative data points and features makes it challenging to hunt for coherent and meaningful clusters from such datasets. Additionally, since clusters could exist in different subspaces of the feature space, a co-clustering algorithm that simultaneously clusters objects and features is often more suitable as compared to one that is restricted to traditional "one-sided" clustering. We propose Robust Overlapping Co-clustering (ROCC), a scalable and very versatile framework that addresses the problem of efficiently mining dense, arbitrarily positioned, possibly overlapping co-clusters from large, noisy datasets. ROCC has several desirable properties that make it extremely well suited to a number of real life applications. Through extensive experimentation we show that our approach is significantly more accurate in identifying biologically meaningful co-clusters in microarray data as compared to several other prominent approaches that have been applied to this task. We also point out other interesting applications of the proposed framework in solving difficult clustering problems.
Keywords :
data analysis; pattern clustering; biologically meaningful coclusters; clustering problems; coclustering algorithm; coherent coclusters; cohesive expressions; feature space; high dimensional datasets; large noisy datasets; microarray data analysis; noninformative data points; one-sided clustering; robust overlapping coclustering; scalable framework; very versatile framework; Clustering algorithms; Conferences; Data analysis; Data mining; Iterative algorithms; Lapping; Optical noise; Robustness; Spatial databases; USA Councils; cluster mining; co-clustering; microarray data;
Conference_Titel :
Data Mining Workshops, 2008. ICDMW '08. IEEE International Conference on
Conference_Location :
Pisa
Print_ISBN :
978-0-7695-3503-6
Electronic_ISBN :
978-0-7695-3503-6
DOI :
10.1109/ICDMW.2008.20