مرکز منطقه ای اطلاع رساني علوم و فناوري - Learning a Bi-Stochastic Data Similarity Matrix

DocumentCode :

2208512

Title :

Learning a Bi-Stochastic Data Similarity Matrix

Author :

Wang, Fei ; Li, Ping ; König, Arnd Christian

Author_Institution :

Dept. of Stat. Sci., Cornell Univ., Ithaca, NY, USA

fYear :

2010

fDate :

13-17 Dec. 2010

Firstpage :

551

Lastpage :

560

Abstract :

An idealized clustering algorithm seeks to learn a cluster-adjacency matrix such that, if two data points belong to the same cluster, the corresponding entry would be 1; otherwise the entry would be 0. This integer (1/0) constraint makes it difficult to find the optimal solution. We propose a relaxation on the cluster-adjacency matrix, by deriving a bi-stochastic matrix from a data similarity (e.g., kernel) matrix according to the Bregman divergence. Our general method is named the Bregmanian Bi-Stochastication (BBS) algorithm. We focus on two popular choices of the Bregman divergence: the Euclidian distance and the KL divergence. Interestingly, the BBS algorithm using the KL divergence is equivalent to the Sinkhorn-Knopp (SK) algorithm for deriving bi-stochastic matrices. We show that the BBS algorithm using the Euclidian distance is closely related to the relaxed K-means clustering and can often produce noticeably superior clustering results than the SK algorithm (and other algorithms such as Normalized Cut), through extensive experiments on public data sets.

Keywords :

data handling; integer programming; learning (artificial intelligence); matrix algebra; pattern clustering; Bregman divergence; Bregmanian bistochastication algorithm; Euclidian distance; Sinkhorn-Knopp algorithm; bistochastic data similarity matrix; cluster adjacency matrix; clustering algorithm; integer constraint; k-means clustering;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Data Mining (ICDM), 2010 IEEE 10th International Conference on

Conference_Location :

Sydney, NSW

ISSN :

1550-4786

Print_ISBN :

978-1-4244-9131-5

Electronic_ISBN :

1550-4786

Type :

conf

DOI :

10.1109/ICDM.2010.141

Filename :

5694009

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2208512