• DocumentCode
    3339283
  • Title

    Consensus Clustering on big data

  • Author

    Hongfu Liu ; Gong Cheng ; Junjie Wu

  • Author_Institution
    Northeastern Univ., Boston, MA, USA
  • fYear
    2015
  • fDate
    22-24 June 2015
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    Big data clustering is a hot topic with the rising of user generated contents. Although a lot of clustering algorithms have been proposed and cloud computing resources are widely available, obtaining a good-quality partition with high efficiency is still up in the air. In this paper, we make full use of consensus clustering to handle Big Data clustering. Generally speaking, we use divide-and-conquer strategy to dissemble the whole Big Data into small subsets, then basic partitions are generated from small subsets and consensus clustering is followed to obtain the final result. For the consensus part, we apply K-means-based Consensus Clustering (KCC) to equivalently transfer the consensus clustering problem into a K-means-like optimization problem for high efficiency. Further, two-sided sampling is extended by random sampling on instances and features simultaneously. Extensive experiments on eight real-world data sets demonstrate the advantages of KCC over some widely used methods. More importantly, the ability to handle incomplete basic partitions and the natural suitability to distributed computing make KCC a promising candidate for Big Data clustering.
  • Keywords
    Big Data; learning (artificial intelligence); pattern clustering; random processes; sampling methods; Big Data clustering; Big Data dissembling; K-means-based consensus clustering; K-means-like optimization problem; KCC; data partitioning; data subsets; distributed computing; divide-and-conquer strategy; incomplete basic partition handling; random sampling; real-world data sets; two-sided sampling; user generated contents; Big data; Clustering algorithms; Convex functions; Linear programming; Optimization; Partitioning algorithms; Robustness;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Service Systems and Service Management (ICSSSM), 2015 12th International Conference on
  • Conference_Location
    Guangzhou
  • Print_ISBN
    978-1-4799-8327-8
  • Type

    conf

  • DOI
    10.1109/ICSSSM.2015.7170344
  • Filename
    7170344