مرکز منطقه ای اطلاع رساني علوم و فناوري - A Parallel and Incremental Approach for Data-Intensive Learning of Bayesian Networks

DocumentCode :

3601200

Title :

A Parallel and Incremental Approach for Data-Intensive Learning of Bayesian Networks

Author :

Kun Yue ; Qiyu Fang ; Xiaoling Wang ; Jin Li ; Weiyi Liu

Author_Institution :

Dept. of Comput. Sci. & Eng., Yunnan Univ., Kunming, China

Volume :

Issue :

fYear :

2015

Firstpage :

2890

Lastpage :

2904

Abstract :

Bayesian network (BN) has been adopted as the underlying model for representing and inferring uncertain knowledge. As the basis of realistic applications centered on probabilistic inferences, learning a BN from data is a critical subject of machine learning, artificial intelligence, and big data paradigms. Currently, it is necessary to extend the classical methods for learning BNs with respect to data-intensive computing or in cloud environments. In this paper, we propose a parallel and incremental approach for data-intensive learning of BNs from massive, distributed, and dynamically changing data by extending the classical scoring and search algorithm and using MapReduce. First, we adopt the minimum description length as the scoring metric and give the two-pass MapReduce-based algorithms for computing the required marginal probabilities and scoring the candidate graphical model from sample data. Then, we give the corresponding strategy for extending the classical hill-climbing algorithm to obtain the optimal structure, as well as that for storing a BN by <;key, value> pairs. Further, in view of the dynamic characteristics of the changing data, we give the concept of influence degree to measure the coincidence of the current BN with new data, and then propose the corresponding two-pass MapReduce-based algorithms for BNs incremental learning. Experimental results show the efficiency, scalability, and effectiveness of our methods.

Keywords :

Big Data; belief networks; inference mechanisms; learning (artificial intelligence); parallel algorithms; probability; BN incremental learning; Bayesian networks; artificial intelligence; big data; candidate graphical model; classical hill-climbing algorithm; cloud environment; data-intensive computing; data-intensive learning; distributed data; dynamically changing data; incremental approach; knowledge inference; machine learning; marginal probability; massive data; minimum description length; parallel approach; probabilistic inference; scoring algorithm; scoring metric; search algorithm; two-pass MapReduce-based algorithm; uncertain knowledge representation; Algorithm design and analysis; Computational modeling; Data models; Distributed databases; Heuristic algorithms; Parallel algorithms; Probability; Bayesian network learning; MapReduce; data-intensive computing; incremental learning; parallel algorithm; uncertain knowledge;

fLanguage :

English

Journal_Title :

Cybernetics, IEEE Transactions on

Publisher :

ieee

ISSN :

2168-2267

Type :

jour

DOI :

10.1109/TCYB.2015.2388791

Filename :

7018001

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3601200