DocumentCode :
1791574
Title :
Distributed Adaptive Model Rules for mining big data streams
Author :
Anh Thu Vu ; De Francisci Morales, Gianmarco ; Gama, Joao ; Bifet, Albert
fYear :
2014
fDate :
27-30 Oct. 2014
Firstpage :
345
Lastpage :
353
Abstract :
Decision rules are among the most expressive data mining models. We propose the first distributed streaming algorithm to learn decision rules for regression tasks. The algorithm is available in SAMOA (Scalable Advanced Massive Online Analysis), an open-source platform for mining big data streams. It uses a hybrid of vertical and horizontal parallelism to distribute Adaptive Model Rules (AMRules) on a cluster. The decision rules built by AMRules are comprehensible models, where the antecedent of a rule is a conjunction of conditions on the attribute values, and the consequent is a linear combination of the attributes. Our evaluation shows that this implementation is scalable in relation to CPU and memory consumption. On a small commodity Samza cluster of 9 nodes, it can handle a rate of more than 30000 instances per second, and achieve a speedup of up to 4.7x over the sequential version.
Keywords :
Big Data; data mining; public domain software; SAMOA; Samza cluster; big data stream mining; decision rules; distributed AMRules; distributed adaptive model rules; distributed streaming algorithm; expressive data mining models; open-source platform; scalable advanced massive online analysis; Adaptation models; Data mining; Data models; Heat-assisted magnetic recording; Machine learning algorithms; Parallel processing; Predictive models;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Big Data (Big Data), 2014 IEEE International Conference on
Conference_Location :
Washington, DC
Type :
conf
DOI :
10.1109/BigData.2014.7004251
Filename :
7004251
Link To Document :
بازگشت