DocumentCode :
441764
Title :
A new linear approximate clustering algorithm based upon sampling with probability distribution
Author :
Yuan, Chang-an ; Tang, Chang-jie ; Li, Chuan ; Hu, Jian-Jun ; Peng, Jing
Author_Institution :
Coll. of Comput., Sichuan Univ., China
Volume :
3
fYear :
2005
fDate :
18-21 Aug. 2005
Firstpage :
1518
Abstract :
Clustering is an important research direction in knowledge discovery. As the classical method in clustering, the k-median algorithm is with serious deficiency such as low efficiency, bad adaptability for large data set etc. To solve this problem, a new method named LCPD (linear clustering based on probability distribution) is proposed in this paper. The main contribution includes: (1) partitions the buckets by using the space of equal probability in the m-dimension super-cube to make the number of data items in each layer ( namely the bucket of Hash) approximate equal, gets the layering sampling with the small cost; (2) The samples under the new algorithms is with sufficient representative power for total data set; (3) proves that the complexity of the new algorithm is O(n); (4) by the comparing experiment shows that the performance of LCPD is 2 magnitude higher than traditional with the number of data set near to 10000, and the clustering quantity is increase 55% with number of data set near to 8000.
Keywords :
data mining; pattern clustering; probability; sampling methods; k-median algorithm; knowledge discovery; linear approximate clustering algorithm; probability distribution; Clustering algorithms; Costs; Distributed computing; Educational institutions; Information technology; Linear approximation; Partitioning algorithms; Probability; Sampling methods; Statistical distributions; Clustering; Hash function; Probability Distributing; Sampling; k-median algorithm;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Machine Learning and Cybernetics, 2005. Proceedings of 2005 International Conference on
Conference_Location :
Guangzhou, China
Print_ISBN :
0-7803-9091-1
Type :
conf
DOI :
10.1109/ICMLC.2005.1527185
Filename :
1527185
Link To Document :
بازگشت