Title :
A Covering Classification Rule Induction Approach for Big Datasets
Author :
Vasilis Kolias;Ioannis Anagnostopoulos;Eleftherios Kayafas
Author_Institution :
Sch. of Electr. &
Abstract :
With the ever increasing production of data from various heterogeneous sources in modern information societies, the need for scalable data-intensive processing is increasing. MapReduce quickly became the de facto framework for large scale data analysis, due to its simple and abstract programming model and its efficient underlying execution system. However, this simplicity comes with a price: its unidirectional communication model and the lack of support for iterations, makes repeated querying of datasets difficult and imposes limitations in many fields including Machine Learning. In this paper we describe the implementation of a classification rule induction algorithm based on MapReduce, with the aim of building a classification model within as few iterations as possible. After a thorough description of the algorithm, we evaluate its performance from three perspectives: its accuracy, its parallel performance and the communication costs. The evaluations indicate that the approach is scalable and since it produces a comprehensive human-readable model it can be proven valuable for a wide range of applications.
Keywords :
"Training","Machine learning algorithms","Big data","Accuracy","Radiation detectors","Production","Clustering algorithms"
Conference_Titel :
Big Data Computing (BDC), 2014 IEEE/ACM International Symposium on
DOI :
10.1109/BDC.2014.17