Title :
Scaling Techniques for Massive Scale-Free Graphs in Distributed (External) Memory
Author :
Pearce, Roger ; Gokhale, Maya ; Amato, Nancy M.
Author_Institution :
Dept. of Comput. Sci. & Eng., Texas A&M Univ., College Station, TX, USA
Abstract :
We present techniques to process large scale-free graphs in distributed memory. Our aim is to scale to trillions of edges, and our research is targeted at leadership class supercomputers and clusters with local non-volatile memory, e.g., NAND Flash. We apply an edge list partitioning technique, designed to accommodate high-degree vertices (hubs) that create scaling challenges when processing scale-free graphs. In addition to partitioning hubs, we use ghost vertices to represent the hubs to reduce communication hotspots. We present a scaling study with three important graph algorithms: Breadth-First Search (BFS), K-Core decomposition, and Triangle Counting. We also demonstrate scalability on BG/P Intrepid by comparing to best known Graph500 results [1]. We show results on two clusters with local NVRAM storage that are capable of traversing trillion-edge scale-free graphs. By leveraging node-local NAND Flash, our approach can process thirty-two times larger datasets with only a 39% performance degradation in Traversed Edges Per Second (TEPS).
Keywords :
complex networks; data handling; distributed memory systems; flash memories; network theory (graphs); parallel algorithms; random-access storage; storage management; tree searching; BFS algorithm; BG/P Intrepid; K-Core decomposition algorithm; breadth-first search algorithm; communication hotspot reduction; distributed memory; edge list partitioning technique; external memory; ghost vertices; graph algorithm; high-degree vertices; large scale-free graphs; local NVRAM storage; local nonvolatile memory; massive scale-free graphs; node-local NAND flash; partitioning hubs; performance degradation; scaling techniques; supercomputers; triangle counting algorithm; trillion-edge scale-free graphs; Algorithm design and analysis; Heuristic algorithms; Nonvolatile memory; Partitioning algorithms; Random access memory; Routing; Topology; big data; distributed computing; graph algorithms; parallel algorithms;
Conference_Titel :
Parallel & Distributed Processing (IPDPS), 2013 IEEE 27th International Symposium on
Conference_Location :
Boston, MA
Print_ISBN :
978-1-4673-6066-1
DOI :
10.1109/IPDPS.2013.72