• DocumentCode
    659452
  • Title

    A distributed vertex-centric approach for pattern matching in massive graphs

  • Author

    Fard, Arash ; Nisar, M. Usman ; Ramaswamy, Lakshmish ; Miller, John A. ; Saltz, Matthew

  • Author_Institution
    Comput. Sci. Dept., Univ. of Georgia, Athens, GA, USA
  • fYear
    2013
  • fDate
    6-9 Oct. 2013
  • Firstpage
    403
  • Lastpage
    411
  • Abstract
    Graph pattern matching is fundamentally important to many applications such as analyzing hyper-links in the World Wide Web, mining associations in online social networks, and substructure search in biochemistry. Most existing graph pattern matching algorithms are highly computation intensive, and do not scale to extremely large graphs that characterize many emerging applications. In recent years, graph processing frameworks such as Pregel have sought to harness shared nothing clusters for processing massive graphs through a vertex-centric, Bulk Synchronous Parallel (BSP) programming model. However, developing scalable and efficient BSP-based algorithms for pattern matching is very challenging because this problem does not naturally align with a vertex-centric programming paradigm. This paper presents novel distributed algorithms based on the vertex-centric programming paradigm for a set of pattern matching models, namely, graph simulation, dual simulation and strong simulation. Our algorithms are fine-tuned to consider the challenges of pattern matching on massive data graphs. Furthermore, we introduce a new pattern matching model, called strict simulation, which outperforms strong simulation in terms of scalability while preserving its important properties. We investigate potential performance bottlenecks and propose several techniques to mitigate them. This paper also presents an extensive set of experiments involving massive graphs (millions of vertices and billions of edges) to study the effects of various parameters on the scalability and performance of the proposed algorithms. The results demonstrate that our techniques are highly effective in alleviating performance bottlenecks and yield significant scalability benefits.
  • Keywords
    digital simulation; distributed algorithms; graph theory; parallel programming; pattern clustering; pattern matching; BSP programming model; Pregel; bulk synchronous parallel programming model; distributed algorithms; distributed vertex-centric approach; dual simulation; graph pattern matching algorithm; graph processing frameworks; graph simulation; massive graph; shared nothing clusters; strict simulation; strong simulation; vertex-centric programming paradigm; Algorithm design and analysis; Clustering algorithms; Computational modeling; Data models; Distributed algorithms; Pattern matching; Programming; data-intensive computing; distributed algorithms; graph simulation; query graphs; subgraph isomorphism;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Big Data, 2013 IEEE International Conference on
  • Conference_Location
    Silicon Valley, CA
  • Type

    conf

  • DOI
    10.1109/BigData.2013.6691601
  • Filename
    6691601