Title :
Parallel Set Determination and K-Means Clustering for Data Mining on Telecommunication Networks
Author :
Da-Qi Ren ; Da Zheng ; Guowei Huang ; Shujie Zhang ; Zane Wei
Author_Institution :
US R&D Center, Huawei Technol., Santa Clara, CA, USA
Abstract :
Data mining (DM) techniques have developed in tandem with the telecommunications market. They are designed to analyze communication behaviors to enable personalized services and reduce customer churn. The major DM process uses data exploration technology to extract data, create predictive models using decision trees, and test and verify the stability and effectiveness of the models. The K-means method segments customers into clusters based on billing, loyalty and payment behaviors to create decision tree-based models. Determining the number of k clusters in a data set with limited prior knowledge of the appropriate value is a common problem that is distinct from solving data clustering issues. Several method categories exist to decide the value of k, but the optimal choice will maximally compress the data inside a single cluster and accurately assign each observation its own cluster. This paper presents a parallel approach for accelerating the determination of k in n observations. We introduce two methods for selecting the initial centroids that save computation iterations in K-means clustering: 1) Carrying centroids forward, 2) Minimum impact. Both approaches are designed to expedite K-means computing and the identification of k.
Keywords :
data compression; data mining; decision trees; invoicing; iterative methods; pattern clustering; telecommunication computing; telecommunication networks; DM techniques; K-mean clustering; K-mean computing; communication behavior analysis; customer churn; data clustering issues; data exploration technology; data mining techniques; decision tree-based models; model effectiveness; model stability; parallel set determination; telecommunication market; telecommunication networks; Acceleration; Clustering algorithms; Data mining; Merging; Optimization; Parallel algorithms; Silicon; Distributed data mining; High performance implementations of data mining algorithms;
Conference_Titel :
High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing (HPCC_EUC), 2013 IEEE 10th International Conference on
Conference_Location :
Zhangjiajie
DOI :
10.1109/HPCC.and.EUC.2013.218