Top-K aggregation over a large graph using shared-nothing systems

Author

Chakraborty, Arpan

Author_Institution

Sch. of Inf. & Comput. Sci., Indiana Univ., Bloomington, IN, USA

fYear

2013

fDate

6-9 Oct. 2013

Firstpage

448

Lastpage

457

Abstract

Analyzing large graphs is crucial to a variety of application domains, like personalized recommendations in social networks, search engines, communication networks, computational biology, etc. In these domains, there is a need to process aggregation queries over large graphs. Existing approaches for aggregation are not suitable for large graphs, as they involve multi-way relational joins over gigantic tables or repeated multiplications of large matrices. In this paper, we consider top-K aggregation queries that involve identifying top-K nodes with highest aggregate values over their h-hop neighbors. We propose algorithms for processing such queries over large graphs in a shared nothing environment. Using the notion of graph partitioning, we propose an update-based algorithm that minimizes network overhead by propagating updates in the neighborhood information. The algorithm partitions a graph across a number of processing nodes, and uses an iterative join algorithm within each node. We present a hybrid scheme to further reduce the network overhead during a few initial iterations. We develop a baseline algorithm based on distributed joins. Our experimental results validate the effectiveness of the proposed algorithms in reducing the aggregation time and in scaling the aggregation computation over a number of distributed hosts.

Keywords

graph theory; iterative methods; query processing; recommender systems; social networking (online); aggregation computation; baseline algorithm; communication networks; computational biology; graph partitioning; h-hop neighbors; iterative join algorithm; large graphs; multiway relation; network overhead; personalized recommendations; search engines; shared nothing environment; shared-nothing systems; social networks; top-K aggregation query; top-k aggregation; update-based algorithm;

fLanguage

English

Publisher

ieee

Conference_Titel

Big Data, 2013 IEEE International Conference on

Conference_Location

Silicon Valley, CA

Type

conf

DOI

10.1109/BigData.2013.6691606

Filename

6691606