Title :
Composite Likelihood Data Augmentation for Within-Network Statistical Relational Learning
Author :
Pfeiffer, Joseph J. ; Neville, Jennifer ; Bennett, Paul N.
Author_Institution :
Dept. of Comput. Sci., Purdue Univ., West Lafayette, IN, USA
Abstract :
The prevalence of datasets that can be represented as networks has recently fueled a great deal of work in the area of Relational Machine Learning (RML). Due to the statistical correlations between linked nodes in the network, many RML methods focus on predicting node features (i.e., labels) using the network relationships. However, many domains are comprised of a single, partially-labeled network. Thus, relational versions of Expectation Maximization (i.e., R-EM), which jointly learn parameters and infer the missing labels, can outperform methods that learn parameters from the labeled data and apply them for inference on the unlabeled nodes. Although R-EM methods can significantly improve predictive performance in networks that are densely labeled, they do not achieve the same gains in sparsely labeled networks and can perform worse than RML methods. In this work, we show the fixed-point methods that R-EM uses for approximate learning and inference result in errors that prevent convergence in sparsely labeled networks. We then propose two methods that do not experience this problem. First, we develop a Relational Stochastic EM (R-SEM) method, which uses stochastic parameters that are not as susceptible to approximation errors. Then we develop a Relational Data Augmentation (R-DA) method, which integrates over a range of stochastic parameter values for inference. R-SEM and R-DA can use any collective RML algorithm for learning and inference in partially labeled networks. We analyze their performance with two RML learners over four real world datasets, and show that they outperform independent learning, RML and R-EM -- particularly in sparsely labeled networks.
Keywords :
approximation theory; data handling; expectation-maximisation algorithm; learning (artificial intelligence); network theory (graphs); stochastic processes; R-DA method; R-EM methods; R-SEM method; RML methods; approximate learning; approximation errors; collective RML algorithm; composite likelihood data augmentation; expectation maximization; fixed-point methods; linked nodes; network relationships; node feature prediction; partially-labeled network; predictive performance; relational data augmentation method; relational inference; relational machine learning; relational stochastic EM method; relational versions; sparsely labeled networks; statistical correlations; stochastic parameters; within-network statistical relational learning; Approximation algorithms; Approximation methods; Estimation; Inference algorithms; Joints; Markov processes; Data Augmentation; Mixing Rate; Nonlinear Dynamical Systems; Statistical Relational Learning;
Conference_Titel :
Data Mining (ICDM), 2014 IEEE International Conference on
Conference_Location :
Shenzhen
Print_ISBN :
978-1-4799-4303-6
DOI :
10.1109/ICDM.2014.151