DocumentCode :
738413
Title :
DSP-CC-: I/O Efficient Parallel Computation of Connected Components in Billion-Scale Networks
Author :
Kim, Min-Soo ; Lee, Sangyeon ; Han, Wook-Shin ; Park, Himchan ; Lee, Jeong-Hoon
Author_Institution :
Department of Information and Communication Engineering, DGIST 333, Techno jungang-daero, Hyeonpung-myeon, Dalseong-gun, Daegu, Republic of Korea
Volume :
27
Issue :
10
fYear :
2015
Firstpage :
2658
Lastpage :
2671
Abstract :
Computing connected components is a core operation on graph data. Since billion-scale graphs cannot be resident in memory of a single server, several approaches based on distributed machines have recently been proposed. The representative methods are mathsf{Hash\\hbox {-}To\\hbox {-}M\\in} and mathsf{PowerGraph} . mathsf{Hash\\hbox {-}To\\hbox {-}M\\in} is the state-of-the art disk-based distributed method which minimizes the number of MapReduce rounds. mathsf{PowerGraph} is the-state-of-the-art in-memory distributed system, which is typically faster than the disk-based distributed one, however, requires a lot of machines for handling billion-scale graphs. In this paper, we propose an I/O efficient parallel algorithm for billion-scale graphs in a single PC. We first propose the Disk-based Sequential access-oriented Parallel processing (DSP) model that exploits sequential disk access in terms of disk I/Os and parallel processing in terms of computation. We then propose an ultra-fast disk-based parallel algorithm for computing connected components, mathsf{DSP\\hbox {-}CC} , which largely improves the performance through sequential disk scan and page-level cache-conscious parallel processing. Extensive experimental results show that mathsf{DSP\\hbox {-}CC} 1) computes connected components in billion-scale graphs using the limited memory size whereas in-memory algorithms can only support medium-sized graphs with the same memory size, and 2) significantly outperforms all distributed competitors as well as a representative disk-based parallel method.
Keywords :
Computational modeling; Data models; Digital signal processing; Memory management; Parallel processing; Performance evaluation; Vectors; Graphs; SSD; connected components; disk-based; graphs; parallel;
fLanguage :
English
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
Publisher :
ieee
ISSN :
1041-4347
Type :
jour
DOI :
10.1109/TKDE.2015.2419665
Filename :
7079453
Link To Document :
بازگشت