• DocumentCode
    3756470
  • Title

    Scalable Fast Evolutionary k-Means Clustering

  • Author

    Gilberto Viana de Oliveira;Murilo Coelho Naldi

  • Author_Institution
    Dept. de Inf., Univ. Fed. de Vicosa, Vicosa, Brazil
  • fYear
    2015
  • Firstpage
    74
  • Lastpage
    79
  • Abstract
    The increasing amount of data requires greater scalability for clustering algorithms. The intrinsic parallelism of the MapReduce model confers management and reliability to large-scale distributed operations. However, its restrictions hinder the direct application of several traditional clustering algorithms. K-means is one of the few clustering algorithms that satisfy the MapReduce constraints, but it requires the prior specification of the number of clusters and is sensitive to their initialization. This paper proposes a MapReduce algorithm able to evolve clusters with no need to specify k-means´ parameters. Through evolutive operators, obtained clusters are used to search for better solutions, allowing the algorithm to find quality solutions quickly. The algorithm is compared with state-of-the-art MapReduce versions of a systematic algorithm which is able to find the number of kmeans clusters and initializations. Computational experiments and statistical analyses of the results indicate that the proposed algorithm is able to obtain clusters with quality equal or superior to clusters of the compared algorithm, but faster.
  • Keywords
    "Clustering algorithms","Partitioning algorithms","Algorithm design and analysis","Prototypes","Data models","Sociology","Statistics"
  • Publisher
    ieee
  • Conference_Titel
    Intelligent Systems (BRACIS), 2015 Brazilian Conference on
  • Type

    conf

  • DOI
    10.1109/BRACIS.2015.20
  • Filename
    7423998