DocumentCode :
659587
Title :
Data chaos: An entropy based MapReduce framework for scalable learning
Author :
Jiaoyan Chen ; Huajun Chen ; Xi Chen ; Guozhou Zheng ; Zhaohui Wu
fYear :
2013
fDate :
6-9 Oct. 2013
Firstpage :
71
Lastpage :
78
Abstract :
Chaos of data is the total unpredictability of all the data elements, and can by quantified by Shannon entropy. In this paper, we firstly propose an entropy based theoretic framework for machine learning, which states that chaos in sample data will decrease and rule will advance as learning progresses. However, it is usually time consuming to apply the theoretic framework because groups of rule need to be trained iteratively and data chaos will be recalculated during each iteration. To implement the theoretic framework for scalable learning, we propose a MapReduce based distributed computational framework. In a case study of classification, the framework parallelly trains multiple classifiers and calculats chaos of the sample set during each iteration, and then resamples a small sample subset with the highest entropy for training of the next iteration, reducing chaos in sample data as quickly as possible. With typical classification benchmarks, our experiment presents entropy in sample data, and proves that the theoretic framework is rational and can help improve the accuracy of machine learning. Meanwhile, the computational framework shows high performance including high efficiency and scalability for large scale learning on hadoop cluster.
Keywords :
entropy; learning (artificial intelligence); parallel processing; pattern classification; Hadoop cluster; MapReduce based distributed computational framework; Shannon entropy; classification benchmark; computational framework; data chaos; data element total unpredictability; entropy based MapReduce framework; entropy based theoretic framework; large scale learning; machine learning; multiple classifier parallel training; scalable learning; Accuracy; Benchmark testing; Chaos; Entropy; Prediction algorithms; Training; Uncertainty; Chaos; Entropy; Machine Learning; MapReduce;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Big Data, 2013 IEEE International Conference on
Conference_Location :
Silicon Valley, CA
Type :
conf
DOI :
10.1109/BigData.2013.6691736
Filename :
6691736
Link To Document :
بازگشت