• DocumentCode
    659457
  • Title

    Top-K aggregation over a large graph using shared-nothing systems

  • Author

    Chakraborty, Arpan

  • Author_Institution
    Sch. of Inf. & Comput. Sci., Indiana Univ., Bloomington, IN, USA
  • fYear
    2013
  • fDate
    6-9 Oct. 2013
  • Firstpage
    448
  • Lastpage
    457
  • Abstract
    Analyzing large graphs is crucial to a variety of application domains, like personalized recommendations in social networks, search engines, communication networks, computational biology, etc. In these domains, there is a need to process aggregation queries over large graphs. Existing approaches for aggregation are not suitable for large graphs, as they involve multi-way relational joins over gigantic tables or repeated multiplications of large matrices. In this paper, we consider top-K aggregation queries that involve identifying top-K nodes with highest aggregate values over their h-hop neighbors. We propose algorithms for processing such queries over large graphs in a shared nothing environment. Using the notion of graph partitioning, we propose an update-based algorithm that minimizes network overhead by propagating updates in the neighborhood information. The algorithm partitions a graph across a number of processing nodes, and uses an iterative join algorithm within each node. We present a hybrid scheme to further reduce the network overhead during a few initial iterations. We develop a baseline algorithm based on distributed joins. Our experimental results validate the effectiveness of the proposed algorithms in reducing the aggregation time and in scaling the aggregation computation over a number of distributed hosts.
  • Keywords
    graph theory; iterative methods; query processing; recommender systems; social networking (online); aggregation computation; baseline algorithm; communication networks; computational biology; graph partitioning; h-hop neighbors; iterative join algorithm; large graphs; multiway relation; network overhead; personalized recommendations; search engines; shared nothing environment; shared-nothing systems; social networks; top-K aggregation query; top-k aggregation; update-based algorithm;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Big Data, 2013 IEEE International Conference on
  • Conference_Location
    Silicon Valley, CA
  • Type

    conf

  • DOI
    10.1109/BigData.2013.6691606
  • Filename
    6691606