DocumentCode :
23455
Title :
Distributed Information Theoretic Clustering
Author :
Pengcheng Shen ; Chunguang Li
Author_Institution :
Dept. of Inf. Sci. & Electron. Eng., Zhejiang Univ., Hangzhou, China
Volume :
62
Issue :
13
fYear :
2014
fDate :
1-Jul-14
Firstpage :
3442
Lastpage :
3453
Abstract :
Distributed data collection and analysis over networks are ubiquitous, especially over the wireless sensor networks (WSNs). Distributed clustering is one of the most important topics in distributed data analysis. It is desired to explore the hidden structure of the data collected/stored in geographically distributed nodes. In recent years, several distributed data clustering techniques have been developed based on the K-means algorithm or the Gaussian mixture model. In these methods, data structures are captured by measures only based on the first and the second order statistics. When the structure of cluster data is complicated, these statistics are insufficient and may lead to unsatisfactory clustering results. In such a case, using information theoretic measures can achieve better clustering performance since they take the whole distribution of cluster data into account. In this work, we incorporate an information theoretic measure into the cost function of the distributed clustering, to present a linear and a kernel distributed clustering algorithms. In the algorithms, each node solves a local clustering problem through diffusion cooperation with its neighboring nodes. In order to preserve privacy and save communication costs, in the cooperation, nodes merely exchange a few parameters instead of original data with their one-hop neighbors. Simulation results show that the proposed distributed algorithms can achieve almost as good clustering results as the corresponding centralized information theoretic clustering algorithms on both synthetic and real data.
Keywords :
data acquisition; data analysis; data structures; pattern clustering; probability; statistics; wireless sensor networks; Gaussian mixture model; K-means algorithm; WSN; centralized information theoretic clustering algorithms; cluster data; communication costs; cost function; data structures; diffusion cooperation; distributed algorithms; distributed data analysis; distributed data clustering techniques; distributed data collection; distributed information theoretic clustering; first order statistics; geographically distributed nodes; information theoretic measures; kernel distributed clustering algorithms; local clustering problem; second order statistics; wireless sensor networks; Approximation methods; Clustering algorithms; Cost function; Data models; Distributed databases; Mutual information; Signal processing algorithms; Diffusion cooperation; discriminative clustering; distributed clustering; information theory; mutual information;
fLanguage :
English
Journal_Title :
Signal Processing, IEEE Transactions on
Publisher :
ieee
ISSN :
1053-587X
Type :
jour
DOI :
10.1109/TSP.2014.2327010
Filename :
6822602
Link To Document :
بازگشت