• DocumentCode
    243792
  • Title

    Big Data Stream Learning with SAMOA

  • Author

    Bifet, Albert ; De Francisci Morales, Gianmarco

  • Author_Institution
    HUAWEI Noah´s Ark Lab., Hong Kong, China
  • fYear
    2014
  • fDate
    14-14 Dec. 2014
  • Firstpage
    1199
  • Lastpage
    1202
  • Abstract
    Big data is flowing into every area of our life, professional and personal. Big data is defined as datasets whose size is beyond the ability of typical software tools to capture, store, manage and analyze, due to the time and memory complexity. Velocity is one of the main properties of big data. In this demo, we present SAMOA (Scalable Advanced Massive Online Analysis), an open-source platform for mining big data streams. It provides a collection of distributed streaming algorithms for the most common data mining and machine learning tasks such as classification, clustering, and regression, as well as programming abstractions to develop new algorithms. It features a pluggable architecture that allows it to run on several distributed stream processing engines such as Storm, S4, and Samza. SAMOA is written in Java and is available at http://samoa-project.net under the Apache Software License version 2.0.
  • Keywords
    Big Data; data mining; learning (artificial intelligence); pattern classification; pattern clustering; regression analysis; Apache Software License version 2.0; Java; S4; SAMOA; Samza; Storm; big data stream learning; big data stream mining; classification task; clustering task; distributed stream processing engines; distributed streaming algorithms; machine learning tasks; open-source platform; pluggable architecture; programming abstractions; regression task; scalable advanced massive online analysis; Algorithm design and analysis; Big data; Data mining; Digital signal processing; Engines; Machine learning algorithms; Storms; Classification; Clustering; Data Streams; Distributed Systems; Machine Learning; Regression; Toolbox;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining Workshop (ICDMW), 2014 IEEE International Conference on
  • Conference_Location
    Shenzhen
  • Print_ISBN
    978-1-4799-4275-6
  • Type

    conf

  • DOI
    10.1109/ICDMW.2014.24
  • Filename
    7022733