Title :
eCCV: A new fuzzy cluster validity measure for large relational bioinformatics datasets
Author :
Popescu, Mihail ; Bezdek, James C. ; Keller, James M.
Author_Institution :
Health Manage. & Med. Inf. Dept., U. of Missouri, Columbia, MO, USA
Abstract :
The existence of BLAST sequence comparison algorithm and microarray technology are among the reasons that make bioinformatics the domain with the most abundant large relational datasets. For example, by BLAST-ing the genes of the human genome (around 30,000 genes) we obtain a 30,000 by 30,000 distance matrix. This matrix can not be currently stored in the memory of a typical desktop PC. In the same time, clustering the resulting matrix using a fuzzy relational clustering algorithm such as Non-Euclidean Fuzzy C-means (NERFCM) requires prior knowledge of the number of clusters existent in the data set. The question is, how can we evaluate the number of clusters if we can´t even load the matrix in the memory our PC? To address this problem, we propose to extend the correlation cluster validity (CCV) that we introduced in a previous paper, denoting the new validity measure as eCCV. eCCV consists of two steps: first sampling of the large matrix followed by the estimation of the number of cluster employing CCV of the sampled data. The sampling strategy produces also a significant processing speedup. We illustrate eCCV properties on a large synthetic dataset and on a large subset of human genes obtained from the RefSeq database.
Keywords :
bioinformatics; genomics; matrix algebra; pattern clustering; BLAST sequence comparison algorithm; RefSeq database; correlation cluster validity; distance matrix; eCCV; fuzzy cluster validity measure; fuzzy relational clustering algorithm; human genome; microarray technology; nonEuclidean fuzzy C-means; relational bioinformatics datasets; Bioinformatics; Biomedical informatics; Clustering algorithms; Fuzzy sets; Genomics; Humans; Partitioning algorithms; Protein engineering; Relational databases; Sampling methods;
Conference_Titel :
Fuzzy Systems, 2009. FUZZ-IEEE 2009. IEEE International Conference on
Conference_Location :
Jeju Island
Print_ISBN :
978-1-4244-3596-8
Electronic_ISBN :
1098-7584
DOI :
10.1109/FUZZY.2009.5277214