Author_Institution :
Dept. of Comput. Sci., KAIST, Daejeon, South Korea
Abstract :
Given a real world graph, how can we find a large subgraph whose partition quality is much better than the original? Graph partitioning has received great attentions in graph mining, and especially balanced graph partitioning is required in many real world applications. However, the balanced graph partitioning is known to be NP-hard, and moreover it is known that there is no good cut at a large scale for real graphs. Due to this difficulty, in this paper, we propose a new paradigm for graph partitioning. Instead of dealing with the whole graph, our focus is on finding a large subgraph with high quality partitions, in terms of conductance. We show that removing problematic nodes, i.e. large degree nodes called hub nodes in real graphs, remarkably decreases conductance for the remaining giant connected component (GCC), while the number of nodes in the GCC is still significant. In experiments, we demonstrate that our method finds a subgraph of quite a large size with low conductance graph partitions, compared with competing methods. We also show that the competitors cannot find connected subgraphs while our method does, by construction. This improvement in partition quality for the subgraph is especially noticeable for large scale cuts - for a balanced partition, down to 14% of the original conductance with GCC size 70% of the total. As a result, the found subgraph has clear partitions at almost all scales compared with the original, and this result especially helps find communities which are well-formed, but hidden by hubs at various scales in real world graphs like social networks.
Keywords :
graph theory; set theory; GCC; connected subgraphs; giant connected component; graph conductance; graph partitioning; partition quality; real world graph; subset discovery; Communities; Image edge detection; Measurement; Partitioning algorithms; Social network services; Time complexity; Balanced Graph Partition; Conductance; Graph Partition;