DocumentCode :
2207730
Title :
Algorithm for Discovering Low-Variance 3-Clusters from Real-Valued Datasets
Author :
Hu, Zhen ; Bhatnagar, Raj
Author_Institution :
Dept. of Comput. Sci., Univ. of Cincinnati, Cincinnati, OH, USA
fYear :
2010
fDate :
13-17 Dec. 2010
Firstpage :
236
Lastpage :
245
Abstract :
The concept of Triclusters has been investigated recently in the context of two relational datasets that share labels along one of the dimensions. By simultaneously processing two datasets to unveil triclusters, new useful knowledge and insights can be obtained. However, some recently reported methods are either closely linked to specific problems or constrain datasets to have some specific distributions. Algorithms for generating triclusters whose cell-values demonstrate simple well known statistical properties, such as upper bounds on standard deviations, are needed for many applications. In this paper we present a 3-Clustering algorithm that searches for meaningful combinations of biclusters in two related datasets. The algorithm can handle situations involving: (i) datasets in which a few data objects may be present in only one dataset and not in both datasets, (ii) the two datasets may have different numbers of objects and/or attributes, and (iii) the cell-value distributions in two datasets may be different. In our formulation the cell-values of each selected tricluster, formed by two independent biclusters, are such that the standard deviations in each bicluster obeys an upper bound and the sets of objects in the two biclusters overlap to the maximum possible extent. We present validation of our algorithm by presenting the properties of the 3-Clusters discovered from a synthetic dataset and from a real world cross-species genomic dataset. The results of our algorithm unveil interesting insights for the cross-species genomic domain.
Keywords :
data mining; pattern clustering; search problems; statistical analysis; cell-value distributions; data mining; low variance cluster; real valued dataset; relational datasets; standard deviation; statistical property; triclusters; Co-clustering; Triclusters;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining (ICDM), 2010 IEEE 10th International Conference on
Conference_Location :
Sydney, NSW
ISSN :
1550-4786
Print_ISBN :
978-1-4244-9131-5
Electronic_ISBN :
1550-4786
Type :
conf
DOI :
10.1109/ICDM.2010.77
Filename :
5693977
Link To Document :
بازگشت