• DocumentCode
    847828
  • Title

    Network Inference From Co-Occurrences

  • Author

    Rabbat, Michael G. ; Figueiredo, Mário A T ; Nowak, Robert D.

  • Author_Institution
    Dept. of Electr. & Comput. Eng., McGill Univ., Montreal, QC
  • Volume
    54
  • Issue
    9
  • fYear
    2008
  • Firstpage
    4053
  • Lastpage
    4068
  • Abstract
    The discovery of networks is a fundamental problem arising in numerous fields of science and technology, including communication systems, biology, sociology, and neuroscience. Unfortunately, it is often difficult, or impossible, to obtain data that directly reveal network structure, and so one must infer a network from incomplete data. This paper considers inferring network structure from "co-occurrence" data: observations that identify which network components (e.g., switches, routers, genes) carry each transmission but do not indicate the order in which they handle the transmission. Without order information, the number of networks that are consistent with the data grows exponentially with the size of the network (i.e., the number of nodes). Yet, the basic engineering/evolutionary principles underlying most networks strongly suggest that not all data-consistent networks are equally likely. In particular, nodes that co-occur in many observations are probably closely connected. With this in mind, we model the co-occurrence observations as independent realizations of a random walk on the network, subjected to a random permutation to account for the lack of order information. Treating permutations as missing data, we derive an expectation-maximization (EM) algorithm for estimating the random walk parameters. The model and EM algorithm significantly simplify the problem, but the computational complexity of the reconstruction process does grow exponentially in the length of each transmission path. For networks with long paths, the exact e-step may be computationally intractable. We propose a polynomial-time Monte Carlo EM algorithm based on importance sampling and derive conditions that ensure convergence of the algorithm with high probability. Simulations and experiments with Internet measurements demonstrate the promise of this approach.
  • Keywords
    Monte Carlo methods; computational complexity; network theory (graphs); parameter estimation; probability; random processes; Internet measurements; co-occurrence modeling; computational complexity; importance sampling; network inference; polynomial-time Monte Carlo expectation-maximization algorithm; probability; random permutation; random walk parameter estimation; reconstruction process; Communication switching; Communications technology; Computational complexity; Computer networks; Data engineering; Monte Carlo methods; Neuroscience; Sociology; Switches; Systems biology; Expectation–maximization (EM) algorithm; Markov models; graphical models; importance sampling; network inference; network tomography;
  • fLanguage
    English
  • Journal_Title
    Information Theory, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0018-9448
  • Type

    jour

  • DOI
    10.1109/TIT.2008.926315
  • Filename
    4608990