• DocumentCode
    3687110
  • Title

    Improving the performance of graph analysis through partitioning with sampling

  • Author

    Michael M. Wolf;Benjamin A. Millery

  • Author_Institution
    Sandia National Laboratories, Albuquerque, NM 87185, United States
  • fYear
    2015
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    Numerous applications focus on the analysis of entities and the connections between them, and such data are naturally represented as graphs. In particular, the detection of a small subset of vertices with anomalous coordinated connectivity is of broad interest, for problems such as detecting strange traffic in a computer network or unknown communities in a social network. Eigenspace analysis of large-scale graphs is useful for dimensionality reduction of these large, noisy data sets into a more tractable analysis problem. When performing this sort of analysis across many parallel processes, the data partitioning scheme may have a significant impact on the overall running time. Previous work demonstrated that partitioning based on a sampled subset of edges still yields a substantial improvement in running time. In this work, we study this further, exploring how different sampling strategies, graph community structure, and the vertex degree distribution affect the partitioning quality. We show that sampling is an effective technique when partitioning for data analytics problems with community-like structure.
  • Keywords
    "Sparse matrices","Stochastic processes","Laboratories","Runtime","Kernel","Generators","Signal processing"
  • Publisher
    ieee
  • Conference_Titel
    High Performance Extreme Computing Conference (HPEC), 2015 IEEE
  • Type

    conf

  • DOI
    10.1109/HPEC.2015.7322457
  • Filename
    7322457