• DocumentCode
    3438424
  • Title

    In-Core Computation of Geometric Centralities with HyperBall: A Hundred Billion Nodes and Beyond

  • Author

    Boldi, Paolo ; Vigna, Sebastiano

  • Author_Institution
    Dipt. di Inf., Univ. degli Studi di Milano, Milan, Italy
  • fYear
    2013
  • fDate
    7-10 Dec. 2013
  • Firstpage
    621
  • Lastpage
    628
  • Abstract
    Given a social network, which of its nodes are more central? This question has been asked many times in sociology, psychology and computer science, and a whole plethora of centrality measures (a.k.a. centrality indices, or rankings) were proposed to account for the importance of the nodes of a network. In this paper, we approach the problem of computing geometric centralities, such as closeness [1] and harmonic centrality [2], on very large graphs; traditionally this task requires an all-pairs shortest-path computation in the exact case, or a number of breadth-first traversals for approximated computations, but these techniques yield very weak statistical guarantees on highly disconnected graphs. We rather assume that the graph is accessed in a semi-streaming fashion, that is, that adjacency lists are scanned almost sequentially, and that a very small amount of memory (in the order of a dozen bytes) per node is available in core memory. We leverage the newly discovered algorithms based on HyperLogLog counters [3], making it possible to approximate a number of geometric centralities at a very high speed and with high accuracy. While the application of similar algorithms for the approximation of closeness was attempted in the MapReduce [4] framework [5], our exploitation of HyperLogLog counters reduces exponentially the memory footprint, paving the way for in-core processing of networks with a hundred billion nodes using "just" 2TiB of RAM. Moreover, the computations we describe are inherently parallelizable, and scale linearly with the number of available cores.
  • Keywords
    distributed processing; graph theory; mathematics computing; HyperBall; HyperLogLog counters; MapReduce framework; RAM; adjacency lists; breadth-first traversals; centrality measures; computer science; disconnected graphs; geometric centralities; harmonic centrality; in-core computation; psychology; semistreaming fashion; shortest-path computation; social network; sociology; statistical guarantees; Approximation algorithms; Approximation methods; Harmonic analysis; Memory management; Radiation detectors; Registers; Standards; Centrality; Distance distribution; Graph algorithms; Probabilistic counters;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining Workshops (ICDMW), 2013 IEEE 13th International Conference on
  • Conference_Location
    Dallas, TX
  • Print_ISBN
    978-1-4799-3143-9
  • Type

    conf

  • DOI
    10.1109/ICDMW.2013.10
  • Filename
    6753978