Improving the performance of graph analysis through partitioning with sampling

Author

Michael M. Wolf;Benjamin A. Millery

Author_Institution

Sandia National Laboratories, Albuquerque, NM 87185, United States

fYear

2015

Firstpage

1

Lastpage

6

Abstract

Numerous applications focus on the analysis of entities and the connections between them, and such data are naturally represented as graphs. In particular, the detection of a small subset of vertices with anomalous coordinated connectivity is of broad interest, for problems such as detecting strange traffic in a computer network or unknown communities in a social network. Eigenspace analysis of large-scale graphs is useful for dimensionality reduction of these large, noisy data sets into a more tractable analysis problem. When performing this sort of analysis across many parallel processes, the data partitioning scheme may have a significant impact on the overall running time. Previous work demonstrated that partitioning based on a sampled subset of edges still yields a substantial improvement in running time. In this work, we study this further, exploring how different sampling strategies, graph community structure, and the vertex degree distribution affect the partitioning quality. We show that sampling is an effective technique when partitioning for data analytics problems with community-like structure.

Keywords

"Sparse matrices","Stochastic processes","Laboratories","Runtime","Kernel","Generators","Signal processing"

Publisher

ieee

Conference_Titel

High Performance Extreme Computing Conference (HPEC), 2015 IEEE

Type

conf

DOI

10.1109/HPEC.2015.7322457

Filename

7322457