DocumentCode
659470
Title
A stream partitioning approach to processing large scale distributed graph datasets
Author
Rui Wang ; Chiu, Kenneth
Author_Institution
Dept. of Comput. Sci., State Univ. of New York at Binghamton, Binghamton, NY, USA
fYear
2013
fDate
6-9 Oct. 2013
Firstpage
537
Lastpage
542
Abstract
RDF datasets are an important source of big data. Many of them, however, are too large to fit on a single machine. One approach to address this is to partition the RDF graph across multiple machines, with each component residing on a single machine. A poor partition can incur significant communication costs, however, if as a result many queries involve multiple machines. A number of existing partitioning schemes seek to reduce these costs by finding partitions that avoid cutting edges in the RDF graph. While these can successfully find good partitions the partitioning process itself is often not very scalable, and not capable of handling incrementally-generated RDF data. In this paper, we develop a more scalable, effective and low complexity approach, online graph dataset partitioning, to produce high quality dataset partitions with fewer links between partitions. We show experimentally that it works well in reducing the communication cost of query processing, while at the same time improving scalability of the partitioning itself.
Keywords
Big Data; graph theory; query processing; RDF dataset; RDF graph partitioning; big data; communication cost; complexity; incrementally-generated RDF data handling; large scale distributed graph dataset processing; online graph dataset partitioning; partition finding; query processing; stream partitioning approach; Approximation algorithms; Data handling; Data models; Indexes; Information management; Partitioning algorithms; Resource description framework; RDF dataset; communication cost; dataset partitioning; graph partitioning; large scale; online algorithm;
fLanguage
English
Publisher
ieee
Conference_Titel
Big Data, 2013 IEEE International Conference on
Conference_Location
Silicon Valley, CA
Type
conf
DOI
10.1109/BigData.2013.6691619
Filename
6691619
Link To Document