Sampling large graphs for anticipatory analytics

Author

Lauren Edwards;Luke Johnson;Maja Milosavljevic;Vijay Gadepally;Benjamin A. Miller

Author_Institution

Lincoln Laboratory, Massachusetts Institute of Technology, Lexington, 02420, United States

fYear

2015

Firstpage

1

Lastpage

6

Abstract

The characteristics of Big Data - often dubbed the 3V´s for volume, velocity, and variety - will continue to outpace the ability of computational systems to process, store, and transmit meaningful results. Traditional techniques for dealing with large datasets often include the purchase of larger systems, greater human-in-the-loop involvement, or more complex algorithms. We are investigating the use of sampling to mitigate these challenges, specifically sampling large graphs. Often, large datasets can be represented as graphs where data entries may be edges, and vertices may be attributes of the data. In particular, we present the results of sampling for the task of link prediction. Link prediction is a process to estimate the probability of a new edge forming between two vertices of a graph, and it has numerous application areas in understanding social or biological networks. In this paper we propose a series of techniques for the sampling of large datasets. In order to quantify the effect of these techniques, we present the quality of link prediction tasks on sampled graphs, and the time saved in calculating link prediction statistics on these sampled graphs.

Keywords

"Sampling methods","Approximation methods","Matrix decomposition","Tensile stress","Prediction methods","Timing","Market research"

Publisher

ieee

Conference_Titel

High Performance Extreme Computing Conference (HPEC), 2015 IEEE

Type

conf

DOI

10.1109/HPEC.2015.7322451

Filename

7322451