• DocumentCode
    2191104
  • Title

    dSimpleGraph: A Novel Distributed Clustering Algorithm for Exploring Very Large Scale Unknown Data Sets

  • Author

    Lu, Li ; Gu, Yunhong ; Grossman, Robert

  • Author_Institution
    Dept. of Comput. Sci., Univ. of Illinois at Chicago, Chicago, IL, USA
  • fYear
    2010
  • fDate
    13-13 Dec. 2010
  • Firstpage
    162
  • Lastpage
    169
  • Abstract
    Some of the major challenges in current clustering applications include: some data sets are so huge that it is difficult to load the entire data sets into memory for clustering, the data sets are often distributed over different locations for various reasons, which makes it impossible to process them centrally, and when lacking prior knowledge of the unknown data sets, it is troublesome to choose the appropriate parameters to feed into existing clustering algorithms. Therefore, a distributed clustering algorithm without too many parameters becomes rather appealing. Although some distributed clustering algorithms have been proposed, it is still a challenge for them to solve all of these problems. In this paper, we propose and implement a novel micro-cluster based distributed clustering algorithm called dSimpleGraph. An equivalence relation on two micro-clusters is defined. Relying on the relation, dSimpleGraph can efficiently cluster data on the local machines, moreover, it can easily generate a determined global view from local views. Only two scalar parameters are needed and the generated clusters can be any shape. Its MapReduce-style structure allows it to be easily implemented on existing distributed computing platforms. Extensive experimental studies show that dSimpleGraph is very fast and very suitable for exploring very large scale unknown data sets.
  • Keywords
    distributed algorithms; graph theory; pattern clustering; MapReduce-style structure; dSimpleGraph; distributed clustering algorithm; distributed computing platform; unknown data set; Distributed clustering algorithm; Equivalence relation; MapReduce;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining Workshops (ICDMW), 2010 IEEE International Conference on
  • Conference_Location
    Sydney, NSW
  • Print_ISBN
    978-1-4244-9244-2
  • Electronic_ISBN
    978-0-7695-4257-7
  • Type

    conf

  • DOI
    10.1109/ICDMW.2010.12
  • Filename
    5693296