Title :
Big Data Stream Learning with SAMOA
Author :
Bifet, Albert ; De Francisci Morales, Gianmarco
Author_Institution :
HUAWEI Noah´s Ark Lab., Hong Kong, China
Abstract :
Big data is flowing into every area of our life, professional and personal. Big data is defined as datasets whose size is beyond the ability of typical software tools to capture, store, manage and analyze, due to the time and memory complexity. Velocity is one of the main properties of big data. In this demo, we present SAMOA (Scalable Advanced Massive Online Analysis), an open-source platform for mining big data streams. It provides a collection of distributed streaming algorithms for the most common data mining and machine learning tasks such as classification, clustering, and regression, as well as programming abstractions to develop new algorithms. It features a pluggable architecture that allows it to run on several distributed stream processing engines such as Storm, S4, and Samza. SAMOA is written in Java and is available at http://samoa-project.net under the Apache Software License version 2.0.
Keywords :
Big Data; data mining; learning (artificial intelligence); pattern classification; pattern clustering; regression analysis; Apache Software License version 2.0; Java; S4; SAMOA; Samza; Storm; big data stream learning; big data stream mining; classification task; clustering task; distributed stream processing engines; distributed streaming algorithms; machine learning tasks; open-source platform; pluggable architecture; programming abstractions; regression task; scalable advanced massive online analysis; Algorithm design and analysis; Big data; Data mining; Digital signal processing; Engines; Machine learning algorithms; Storms; Classification; Clustering; Data Streams; Distributed Systems; Machine Learning; Regression; Toolbox;
Conference_Titel :
Data Mining Workshop (ICDMW), 2014 IEEE International Conference on
Conference_Location :
Shenzhen
Print_ISBN :
978-1-4799-4275-6
DOI :
10.1109/ICDMW.2014.24