• DocumentCode
    76794
  • Title

    Heterogeneous Environment Aware Streaming Graph Partitioning

  • Author

    Ning Xu ; Bin Cui ; Lei Chen ; Zi Huang ; Yingxia Shao

  • Author_Institution
    Sch. of EECS, Peking Univ., Beijing, China
  • Volume
    27
  • Issue
    6
  • fYear
    2015
  • fDate
    June 1 2015
  • Firstpage
    1560
  • Lastpage
    1572
  • Abstract
    With the increasing availability of graph data and widely adopted cloud computing paradigm, graph partitioning has become an efficient pre-processing technique to balance the computing workload and cope with the large scale of input data. Since the cost of partitioning the entire graph is strictly prohibitive, there are some recent tentative works towards streaming graph partitioning which run faster, are easily parallelized, and can be incrementally updated. Most of the existing works on streaming partitioning assume that worker nodes within a cluster are homogeneous in nature. Unfortunately, this assumption does not always hold. Experiments show that these homogeneous algorithms suffer a significant performance degradation when running at heterogeneous environment. In this paper, we propose a novel adaptive streaming graph partitioning approach to cope with heterogeneous environment. We first formally model the heterogeneous computing environment with the consideration of the unbalance of computing ability (e.g., the CPU frequency) and communication ability (e.g., the network bandwidth) for each node. Based on this model, we propose a new graph partitioning objective function that aims to minimize the total execution time of the graph-processing job. We then explore some simple yet effective streaming algorithms for this objective function that can achieve balanced and efficient partitioning result. Extensive experiments are conducted on a moderate sized computing cluster with real-world web and social network graphs. The results demonstrate that the proposed approach achieves significant improvement compared with the state-of-the-art solutions.
  • Keywords
    cloud computing; graph theory; cloud computing; heterogeneous environment; streaming graph partitioning; Bandwidth; Cloud computing; Clustering algorithms; Computational modeling; Hardware; Linear programming; Partitioning algorithms; BSP Model; Graph Partitioning; Graph partitioning; Heterogeneous Environment; Streaming Algorithms; heterogeneous environment; streaming algorithms;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2014.2377743
  • Filename
    6975163