• DocumentCode
    2014879
  • Title

    Sampling directed graphs with random walks

  • Author

    Ribeiro, Bruno ; Wang, Pinghui ; Murai, Fabricio ; Towsley, Don

  • Author_Institution
    Comput. Sci. Dept., Univ. of Massachusetts, Amherst, MA, USA
  • fYear
    2012
  • fDate
    25-30 March 2012
  • Firstpage
    1692
  • Lastpage
    1700
  • Abstract
    Despite recent efforts to characterize complex networks such as citation graphs or online social networks (OSNs), little attention has been given to developing tools that can be used to characterize directed graphs in the wild, where no pre-processed data is available. The presence of hidden incoming edges but observable outgoing edges poses a challenge to characterize large directed graphs through crawling, as existing sampling methods cannot cope with hidden incoming links. The driving principle behind our random walk (RW) sampling method is to construct, in real-time, an undirected graph from the directed graph such that the random walk on the directed graph is consistent with one on the undirected graph. We then use the RW on the undirected graph to estimate the outdegree distribution. Our algorithm accurately estimates outdegree distributions of a variety of real world graphs. We also study the hardness of indegree distribution estimation when indegrees are latent (i.e., incoming links are only observed as outgoing edges). We observe that, in the same scenarios, indegree distribution estimates are highly innacurate unless the directed graph is highly symmetrical.
  • Keywords
    complex networks; directed graphs; estimation theory; random processes; citation graph; complex network; directed graph sampling; indegree distribution estimation; online social network; outdegree distribution estimation; random walk sampling method; undirected graph; Google; World Wide Web;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    INFOCOM, 2012 Proceedings IEEE
  • Conference_Location
    Orlando, FL
  • ISSN
    0743-166X
  • Print_ISBN
    978-1-4673-0773-4
  • Type

    conf

  • DOI
    10.1109/INFCOM.2012.6195540
  • Filename
    6195540