DocumentCode :
659470
Title :
A stream partitioning approach to processing large scale distributed graph datasets
Author :
Rui Wang ; Chiu, Kenneth
Author_Institution :
Dept. of Comput. Sci., State Univ. of New York at Binghamton, Binghamton, NY, USA
fYear :
2013
fDate :
6-9 Oct. 2013
Firstpage :
537
Lastpage :
542
Abstract :
RDF datasets are an important source of big data. Many of them, however, are too large to fit on a single machine. One approach to address this is to partition the RDF graph across multiple machines, with each component residing on a single machine. A poor partition can incur significant communication costs, however, if as a result many queries involve multiple machines. A number of existing partitioning schemes seek to reduce these costs by finding partitions that avoid cutting edges in the RDF graph. While these can successfully find good partitions the partitioning process itself is often not very scalable, and not capable of handling incrementally-generated RDF data. In this paper, we develop a more scalable, effective and low complexity approach, online graph dataset partitioning, to produce high quality dataset partitions with fewer links between partitions. We show experimentally that it works well in reducing the communication cost of query processing, while at the same time improving scalability of the partitioning itself.
Keywords :
Big Data; graph theory; query processing; RDF dataset; RDF graph partitioning; big data; communication cost; complexity; incrementally-generated RDF data handling; large scale distributed graph dataset processing; online graph dataset partitioning; partition finding; query processing; stream partitioning approach; Approximation algorithms; Data handling; Data models; Indexes; Information management; Partitioning algorithms; Resource description framework; RDF dataset; communication cost; dataset partitioning; graph partitioning; large scale; online algorithm;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Big Data, 2013 IEEE International Conference on
Conference_Location :
Silicon Valley, CA
Type :
conf
DOI :
10.1109/BigData.2013.6691619
Filename :
6691619
Link To Document :
بازگشت