• DocumentCode
    3687104
  • Title

    Sampling large graphs for anticipatory analytics

  • Author

    Lauren Edwards;Luke Johnson;Maja Milosavljevic;Vijay Gadepally;Benjamin A. Miller

  • Author_Institution
    Lincoln Laboratory, Massachusetts Institute of Technology, Lexington, 02420, United States
  • fYear
    2015
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    The characteristics of Big Data - often dubbed the 3V´s for volume, velocity, and variety - will continue to outpace the ability of computational systems to process, store, and transmit meaningful results. Traditional techniques for dealing with large datasets often include the purchase of larger systems, greater human-in-the-loop involvement, or more complex algorithms. We are investigating the use of sampling to mitigate these challenges, specifically sampling large graphs. Often, large datasets can be represented as graphs where data entries may be edges, and vertices may be attributes of the data. In particular, we present the results of sampling for the task of link prediction. Link prediction is a process to estimate the probability of a new edge forming between two vertices of a graph, and it has numerous application areas in understanding social or biological networks. In this paper we propose a series of techniques for the sampling of large datasets. In order to quantify the effect of these techniques, we present the quality of link prediction tasks on sampled graphs, and the time saved in calculating link prediction statistics on these sampled graphs.
  • Keywords
    "Sampling methods","Approximation methods","Matrix decomposition","Tensile stress","Prediction methods","Timing","Market research"
  • Publisher
    ieee
  • Conference_Titel
    High Performance Extreme Computing Conference (HPEC), 2015 IEEE
  • Type

    conf

  • DOI
    10.1109/HPEC.2015.7322451
  • Filename
    7322451