Title :
Data Replication for Distributed Graph Processing
Author :
Li-Yung Ho ; Jan-Jan Wu ; Pangfeng Liu
Author_Institution :
Dept. of Comput. Sci. & Inf. Eng., Nat. Taiwan Univ., Taipei, Taiwan
fDate :
June 28 2013-July 3 2013
Abstract :
We present a data replication framework for distributed graph processing. First we partition a graph and store each partition in a machine. Then we replicate all partitions and assign replicas to machines, where each machine can store only a limited number of replicas. The goal is to replicate the partitions so that each partition has at least a certain number of replicated copies, and the cost is minimized. The cost is defined as the data traffic needed to run general graph processing algorithms. The cost metric is the overall transmission cost of all machines, and the maximum transmission cost of a single machine. We propose an optimal algorithm based on linear programming to solve the problem of minimizing the overall transmission cost. We also propose an optimal algorithm to solve a special problem of minimizing the maximum transmission cost of a node.
Keywords :
data analysis; distributed processing; graph theory; linear programming; minimisation; cost metric; data replication framework; data traffic; distributed graph processing; graph partitioning; linear programming; transmission cost minimisation; Clustering algorithms; Data models; Distributed databases; Equations; Linear programming; Social network services; Vectors; algorithm; binary integer programming; data replication; minimum cost flow; social networks; totally unimodular;
Conference_Titel :
Cloud Computing (CLOUD), 2013 IEEE Sixth International Conference on
Conference_Location :
Santa Clara, CA
Print_ISBN :
978-0-7695-5028-2
DOI :
10.1109/CLOUD.2013.55