DocumentCode
2131138
Title
Hunting for Coherent Co-clusters in High Dimensional and Noisy Datasets
Author
Deodhar, Meghana ; Ghosh, Joydeep ; Gupta, Gunjan ; Cho, Hyuk ; Dhillon, Inderjit
Author_Institution
Dept. of ECE, Univ. of Texas, Austin, TX
fYear
2008
fDate
15-19 Dec. 2008
Firstpage
654
Lastpage
663
Abstract
Clustering problems often involve datasets where only a part of the data is relevant to the problem, e.g., in microarray data analysis only a subset of the genes show cohesive expressions within a subset of the conditions/features. The existence of a large number of non-informative data points and features makes it challenging to hunt for coherent and meaningful clusters from such datasets. Additionally, since clusters could exist in different subspaces of the feature space, a co-clustering algorithm that simultaneously clusters objects and features is often more suitable as compared to one that is restricted to traditional "one-sided" clustering. We propose Robust Overlapping Co-clustering (ROCC), a scalable and very versatile framework that addresses the problem of efficiently mining dense, arbitrarily positioned, possibly overlapping co-clusters from large, noisy datasets. ROCC has several desirable properties that make it extremely well suited to a number of real life applications. Through extensive experimentation we show that our approach is significantly more accurate in identifying biologically meaningful co-clusters in microarray data as compared to several other prominent approaches that have been applied to this task. We also point out other interesting applications of the proposed framework in solving difficult clustering problems.
Keywords
data analysis; pattern clustering; biologically meaningful coclusters; clustering problems; coclustering algorithm; coherent coclusters; cohesive expressions; feature space; high dimensional datasets; large noisy datasets; microarray data analysis; noninformative data points; one-sided clustering; robust overlapping coclustering; scalable framework; very versatile framework; Clustering algorithms; Conferences; Data analysis; Data mining; Iterative algorithms; Lapping; Optical noise; Robustness; Spatial databases; USA Councils; cluster mining; co-clustering; microarray data;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Mining Workshops, 2008. ICDMW '08. IEEE International Conference on
Conference_Location
Pisa
Print_ISBN
978-0-7695-3503-6
Electronic_ISBN
978-0-7695-3503-6
Type
conf
DOI
10.1109/ICDMW.2008.20
Filename
4733991
Link To Document