DocumentCode :
3519329
Title :
Heart of the Matter: Discovering the Consensus of Multiple Clustering Results
Author :
Kosorukoff, Alex ; Sinha, Saurabh
Author_Institution :
Dept. of Comput. Sci., Univ. of Illinois, Urbana, IL
fYear :
2008
fDate :
3-5 Nov. 2008
Firstpage :
155
Lastpage :
162
Abstract :
Clustering is widely used by genomics researchers to discover functional patterns in data. The inherent subjectivity and hardness of the clustering task often lead researchers to explore multiple clustering results of the same data, using different algorithms and parameter settings. This further necessitates a method to automatically summarize multiple clustering results. A natural question to ask about several clustering results is "what is the structure they all have in common?" This work presents a computational method to answer this question. We provide a precise formulation of the problem of computing the consensus of several clusterings, examine its computational complexity and find the problem to be NP-hard. We describe a greedy heuristic to solve the problem, and assess its performance on synthetic data. We demonstrate several applications of this algorithm on genomics data. Our program will be freely available for download.
Keywords :
biology computing; computational complexity; genomics; pattern clustering; NP-hard problems; computational complexity; genomics; greedy heuristic; multiple clustering; Bioinformatics; Biology computing; Clustering algorithms; Computational complexity; Computer science; Data analysis; Genomics; Heart; Organisms; Stress;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Bioinformatics and Biomedicine, 2008. BIBM '08. IEEE International Conference on
Conference_Location :
Philadelphia, PA
Print_ISBN :
978-0-7695-3452-7
Type :
conf
DOI :
10.1109/BIBM.2008.28
Filename :
4684887
Link To Document :
بازگشت