• DocumentCode
    441764
  • Title

    A new linear approximate clustering algorithm based upon sampling with probability distribution

  • Author

    Yuan, Chang-an ; Tang, Chang-jie ; Li, Chuan ; Hu, Jian-Jun ; Peng, Jing

  • Author_Institution
    Coll. of Comput., Sichuan Univ., China
  • Volume
    3
  • fYear
    2005
  • fDate
    18-21 Aug. 2005
  • Firstpage
    1518
  • Abstract
    Clustering is an important research direction in knowledge discovery. As the classical method in clustering, the k-median algorithm is with serious deficiency such as low efficiency, bad adaptability for large data set etc. To solve this problem, a new method named LCPD (linear clustering based on probability distribution) is proposed in this paper. The main contribution includes: (1) partitions the buckets by using the space of equal probability in the m-dimension super-cube to make the number of data items in each layer ( namely the bucket of Hash) approximate equal, gets the layering sampling with the small cost; (2) The samples under the new algorithms is with sufficient representative power for total data set; (3) proves that the complexity of the new algorithm is O(n); (4) by the comparing experiment shows that the performance of LCPD is 2 magnitude higher than traditional with the number of data set near to 10000, and the clustering quantity is increase 55% with number of data set near to 8000.
  • Keywords
    data mining; pattern clustering; probability; sampling methods; k-median algorithm; knowledge discovery; linear approximate clustering algorithm; probability distribution; Clustering algorithms; Costs; Distributed computing; Educational institutions; Information technology; Linear approximation; Partitioning algorithms; Probability; Sampling methods; Statistical distributions; Clustering; Hash function; Probability Distributing; Sampling; k-median algorithm;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning and Cybernetics, 2005. Proceedings of 2005 International Conference on
  • Conference_Location
    Guangzhou, China
  • Print_ISBN
    0-7803-9091-1
  • Type

    conf

  • DOI
    10.1109/ICMLC.2005.1527185
  • Filename
    1527185