• DocumentCode
    3678365
  • Title

    GraphTrek: Asynchronous Graph Traversal for Property Graph-Based Metadata Management

  • Author

    Dong Dai;Philip Carns;Robert B. Ross;John Jenkins;Kyle Blauer;Yong Chen

  • Author_Institution
    Comput. Sci. Dept., Texas Tech Univ., Lubbock, TX, USA
  • fYear
    2015
  • Firstpage
    284
  • Lastpage
    293
  • Abstract
    Property graphs are a promising data model for rich metadata management in high-performance computing (HPC) systems because of their ability to represent not only metadata attributes but also the relationships between them. A property graph can be used to record the relationships between users, jobs, and data, for example, with unique annotations for each entity. This high-volume, power-law distributed use case is a natural fit for an out-of-core distributed property graph database. Such a system must support live updates (to ingest production information in real time), low-latency point queries (for frequent metadata operations such as permission checking), and large-scale traversals (for provenance data mining). Large-scale property graph traversals are particularly challenging for distributed graph databases, however. Most existing graph databases implement a "level-synchronous" breadth-first search algorithm that relies on global synchronization in each traversal step. This traversal model performs well in many problem domains, but a rich metadata management system is characterized by imbalanced graphs, long traversal lengths, and concurrent workloads, each of which has the potential to introduce or exacerbate stragglers. We define stragglers as abnormally slow steps (or servers) in a graph traversal that lead to low overall throughput for synchronous traversal algorithms. The straggler problem can be mitigated by the use of asynchronous traversal algorithms. Asynchronous traversal has been successfully demonstrated in graph processing frameworks, but such systems require the graph to be loaded into a separate batch-processing framework. In this work, we propose GraphTrek, a general asynchronous graph traversal engine working with graph databases for processing rich metadata management in their native format. We also outline a traversal-aware query language and key optimizations (traversal-affiliate caching and execution merging) necessary for efficient performance. Our experiments show that the asynchronous graph traversal engine is more efficient than its synchronous counterpart in the case of HPC rich metadata processing, where more servers are involved and larger traversals are needed.
  • Keywords
    "Servers","Metadata","Databases","Engines","Synchronization","Optimization"
  • Publisher
    ieee
  • Conference_Titel
    Cluster Computing (CLUSTER), 2015 IEEE International Conference on
  • Type

    conf

  • DOI
    10.1109/CLUSTER.2015.48
  • Filename
    7307595