• DocumentCode
    659470
  • Title

    A stream partitioning approach to processing large scale distributed graph datasets

  • Author

    Rui Wang ; Chiu, Kenneth

  • Author_Institution
    Dept. of Comput. Sci., State Univ. of New York at Binghamton, Binghamton, NY, USA
  • fYear
    2013
  • fDate
    6-9 Oct. 2013
  • Firstpage
    537
  • Lastpage
    542
  • Abstract
    RDF datasets are an important source of big data. Many of them, however, are too large to fit on a single machine. One approach to address this is to partition the RDF graph across multiple machines, with each component residing on a single machine. A poor partition can incur significant communication costs, however, if as a result many queries involve multiple machines. A number of existing partitioning schemes seek to reduce these costs by finding partitions that avoid cutting edges in the RDF graph. While these can successfully find good partitions the partitioning process itself is often not very scalable, and not capable of handling incrementally-generated RDF data. In this paper, we develop a more scalable, effective and low complexity approach, online graph dataset partitioning, to produce high quality dataset partitions with fewer links between partitions. We show experimentally that it works well in reducing the communication cost of query processing, while at the same time improving scalability of the partitioning itself.
  • Keywords
    Big Data; graph theory; query processing; RDF dataset; RDF graph partitioning; big data; communication cost; complexity; incrementally-generated RDF data handling; large scale distributed graph dataset processing; online graph dataset partitioning; partition finding; query processing; stream partitioning approach; Approximation algorithms; Data handling; Data models; Indexes; Information management; Partitioning algorithms; Resource description framework; RDF dataset; communication cost; dataset partitioning; graph partitioning; large scale; online algorithm;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Big Data, 2013 IEEE International Conference on
  • Conference_Location
    Silicon Valley, CA
  • Type

    conf

  • DOI
    10.1109/BigData.2013.6691619
  • Filename
    6691619