DocumentCode
650614
Title
Data Replication for Distributed Graph Processing
Author
Li-Yung Ho ; Jan-Jan Wu ; Pangfeng Liu
Author_Institution
Dept. of Comput. Sci. & Inf. Eng., Nat. Taiwan Univ., Taipei, Taiwan
fYear
2013
fDate
June 28 2013-July 3 2013
Firstpage
319
Lastpage
326
Abstract
We present a data replication framework for distributed graph processing. First we partition a graph and store each partition in a machine. Then we replicate all partitions and assign replicas to machines, where each machine can store only a limited number of replicas. The goal is to replicate the partitions so that each partition has at least a certain number of replicated copies, and the cost is minimized. The cost is defined as the data traffic needed to run general graph processing algorithms. The cost metric is the overall transmission cost of all machines, and the maximum transmission cost of a single machine. We propose an optimal algorithm based on linear programming to solve the problem of minimizing the overall transmission cost. We also propose an optimal algorithm to solve a special problem of minimizing the maximum transmission cost of a node.
Keywords
data analysis; distributed processing; graph theory; linear programming; minimisation; cost metric; data replication framework; data traffic; distributed graph processing; graph partitioning; linear programming; transmission cost minimisation; Clustering algorithms; Data models; Distributed databases; Equations; Linear programming; Social network services; Vectors; algorithm; binary integer programming; data replication; minimum cost flow; social networks; totally unimodular;
fLanguage
English
Publisher
ieee
Conference_Titel
Cloud Computing (CLOUD), 2013 IEEE Sixth International Conference on
Conference_Location
Santa Clara, CA
Print_ISBN
978-0-7695-5028-2
Type
conf
DOI
10.1109/CLOUD.2013.55
Filename
6676710
Link To Document