• DocumentCode
    244967
  • Title

    Composite Likelihood Data Augmentation for Within-Network Statistical Relational Learning

  • Author

    Pfeiffer, Joseph J. ; Neville, Jennifer ; Bennett, Paul N.

  • Author_Institution
    Dept. of Comput. Sci., Purdue Univ., West Lafayette, IN, USA
  • fYear
    2014
  • fDate
    14-17 Dec. 2014
  • Firstpage
    490
  • Lastpage
    499
  • Abstract
    The prevalence of datasets that can be represented as networks has recently fueled a great deal of work in the area of Relational Machine Learning (RML). Due to the statistical correlations between linked nodes in the network, many RML methods focus on predicting node features (i.e., labels) using the network relationships. However, many domains are comprised of a single, partially-labeled network. Thus, relational versions of Expectation Maximization (i.e., R-EM), which jointly learn parameters and infer the missing labels, can outperform methods that learn parameters from the labeled data and apply them for inference on the unlabeled nodes. Although R-EM methods can significantly improve predictive performance in networks that are densely labeled, they do not achieve the same gains in sparsely labeled networks and can perform worse than RML methods. In this work, we show the fixed-point methods that R-EM uses for approximate learning and inference result in errors that prevent convergence in sparsely labeled networks. We then propose two methods that do not experience this problem. First, we develop a Relational Stochastic EM (R-SEM) method, which uses stochastic parameters that are not as susceptible to approximation errors. Then we develop a Relational Data Augmentation (R-DA) method, which integrates over a range of stochastic parameter values for inference. R-SEM and R-DA can use any collective RML algorithm for learning and inference in partially labeled networks. We analyze their performance with two RML learners over four real world datasets, and show that they outperform independent learning, RML and R-EM -- particularly in sparsely labeled networks.
  • Keywords
    approximation theory; data handling; expectation-maximisation algorithm; learning (artificial intelligence); network theory (graphs); stochastic processes; R-DA method; R-EM methods; R-SEM method; RML methods; approximate learning; approximation errors; collective RML algorithm; composite likelihood data augmentation; expectation maximization; fixed-point methods; linked nodes; network relationships; node feature prediction; partially-labeled network; predictive performance; relational data augmentation method; relational inference; relational machine learning; relational stochastic EM method; relational versions; sparsely labeled networks; statistical correlations; stochastic parameters; within-network statistical relational learning; Approximation algorithms; Approximation methods; Estimation; Inference algorithms; Joints; Markov processes; Data Augmentation; Mixing Rate; Nonlinear Dynamical Systems; Statistical Relational Learning;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining (ICDM), 2014 IEEE International Conference on
  • Conference_Location
    Shenzhen
  • ISSN
    1550-4786
  • Print_ISBN
    978-1-4799-4303-6
  • Type

    conf

  • DOI
    10.1109/ICDM.2014.151
  • Filename
    7023366