• DocumentCode
    1330980
  • Title

    Towards Unbiased BFS Sampling

  • Author

    Kurant, Maciej ; Markopoulou, Athina ; Thiran, Patrick

  • Author_Institution
    California Inst. for Telecommun. & Inf. Technol. (CalIT2), Univ. of California, Irvine, CA, USA
  • Volume
    29
  • Issue
    9
  • fYear
    2011
  • fDate
    10/1/2011 12:00:00 AM
  • Firstpage
    1799
  • Lastpage
    1809
  • Abstract
    Breadth First Search (BFS) is a widely used approach for sampling large graphs. However, it has been empirically observed that BFS sampling is biased toward high-degree nodes, which may strongly affect the measurement results. In this paper, we quantify and correct the degree bias of BFS. First, we consider a random graph RG(pk) with an arbitrary degree distribution pk. For this model, we calculate the node degree distribution expected to be observed by BFS as a function of the fraction f of covered nodes. We also show that, for RG(pk), all commonly used graph traversal techniques (BFS, DFS, Forest Fire, Snowball Sampling, RDS) have exactly the same bias. Next, we propose a practical BFS-bias correction procedure that takes as input a collected BFS sample together with the fraction f. Our correction technique is exact (i.e., leads to unbiased estimation) for RG(pk). Furthermore, it performs well when applied to a broad range of Internet topologies and to two large BFS samples of Facebook and Orkut networks.
  • Keywords
    graph theory; graphs; sampling methods; social networking (online); telecommunication network topology; tree searching; Facebook; Orkut networks; breadth first search; graphs; high-degree nodes; node degree distribution; random graph; unbiased BFS sampling; Equations; Facebook; Fires; Indexes; Mathematical model; Peer to peer computing; Scattering; Breadth-First-Search (BFS); bias; estimation; network topology; online social networks; sampling methods;
  • fLanguage
    English
  • Journal_Title
    Selected Areas in Communications, IEEE Journal on
  • Publisher
    ieee
  • ISSN
    0733-8716
  • Type

    jour

  • DOI
    10.1109/JSAC.2011.111005
  • Filename
    6027862