Title :
Learning relaxed 3-clusters from pairs of related datasets
Author :
Jagadeesh Patchala;Raj Bhatnagar
Author_Institution :
Dept. of Electr. Eng. &
Abstract :
In many emerging data mining situations we encounter multiple large binary relational datasets that are generated independently but are semantically interconnected and must be mined simultaneously to obtain an integrated effect of the data residing in all of them. The idea of finding 3-clusters is increasingly used in situations where one has to concurrently mine two distinct datasets that share a common domain along a dimension. By discovering 3-clusters, one can obtain important insights on the underlying connections between the objects of different domains. All the 3-clustering algorithms for binary datasets presented till now are able to find 3-clusters where the two distinct bi-clusters in a 3-cluster are strict, that is, the rectangle formed by the bi-clusters contains only `1´ entries. However, in many real world applications the datasets are very sparse and finding a relaxed bi-cluster, that allows some zeros in the bi-clusters´ rectangles, is very valuable. In this paper, we present a novel search based algorithm that finds relaxed 3-clusters from two binary datasets that share a domain. Each identified 3-cluster involves two relaxed bi-clusters whose overlap in the sets of objects is maximal. Through our algorithm, we are also able to exert finer control over the percentage of 1 s allowed in each of the bi-clusters. We validate the effectiveness of our algorithm by using synthetic and real binary datasets from different domains. Our results show that our notion of 3-cluster is able to produce more meaningful results when compared to 3-clusters with strict requirement of all ones.
Keywords :
"Meteorology","Clustering algorithms","Diseases","Algorithm design and analysis","Image color analysis","Heuristic algorithms","Big data"
Conference_Titel :
Big Data (Big Data), 2015 IEEE International Conference on
DOI :
10.1109/BigData.2015.7363916