مرکز منطقه ای اطلاع رساني علوم و فناوري - Algorithm for Discovering Low-Variance 3-Clusters from Real-Valued Datasets

DocumentCode :

2207730

Title :

Algorithm for Discovering Low-Variance 3-Clusters from Real-Valued Datasets

Author :

Hu, Zhen ; Bhatnagar, Raj

Author_Institution :

Dept. of Comput. Sci., Univ. of Cincinnati, Cincinnati, OH, USA

fYear :

2010

fDate :

13-17 Dec. 2010

Firstpage :

236

Lastpage :

245

Abstract :

The concept of Triclusters has been investigated recently in the context of two relational datasets that share labels along one of the dimensions. By simultaneously processing two datasets to unveil triclusters, new useful knowledge and insights can be obtained. However, some recently reported methods are either closely linked to specific problems or constrain datasets to have some specific distributions. Algorithms for generating triclusters whose cell-values demonstrate simple well known statistical properties, such as upper bounds on standard deviations, are needed for many applications. In this paper we present a 3-Clustering algorithm that searches for meaningful combinations of biclusters in two related datasets. The algorithm can handle situations involving: (i) datasets in which a few data objects may be present in only one dataset and not in both datasets, (ii) the two datasets may have different numbers of objects and/or attributes, and (iii) the cell-value distributions in two datasets may be different. In our formulation the cell-values of each selected tricluster, formed by two independent biclusters, are such that the standard deviations in each bicluster obeys an upper bound and the sets of objects in the two biclusters overlap to the maximum possible extent. We present validation of our algorithm by presenting the properties of the 3-Clusters discovered from a synthetic dataset and from a real world cross-species genomic dataset. The results of our algorithm unveil interesting insights for the cross-species genomic domain.

Keywords :

data mining; pattern clustering; search problems; statistical analysis; cell-value distributions; data mining; low variance cluster; real valued dataset; relational datasets; standard deviation; statistical property; triclusters; Co-clustering; Triclusters;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Data Mining (ICDM), 2010 IEEE 10th International Conference on

Conference_Location :

Sydney, NSW

ISSN :

1550-4786

Print_ISBN :

978-1-4244-9131-5

Electronic_ISBN :

1550-4786

Type :

conf

DOI :

10.1109/ICDM.2010.77

Filename :

5693977

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2207730