• DocumentCode
    3592116
  • Title

    A Scalable Data Stream Mining Methodology: Stream-Based Holistic Analytics and Reasoning in Parallel

  • Author

    Fong, Simon ; Yan Zhuang ; Wong, Raymond ; Mohammed, Sabah

  • Author_Institution
    Dept. of Comput. & Inf. Sci., Univ. of Macau, Macau, China
  • fYear
    2014
  • Firstpage
    110
  • Lastpage
    115
  • Abstract
    Big Data though it is a hype up-springing many technical challenges that confront both academic research communities and commercial IT deployment, the root sources of Big Data are founded on data streams. It is generally known that data which are sourced from data streams accumulate continuously making traditional batch-based model induction algorithms infeasible for real-time data mining or high-speed data analytics in a broad sense. In this paper, a novel data stream mining methodology, called Stream-based Holistic Analytics and Reasoning in Parallel (SHARP) is proposed. SHARP is based on principles of incremental learning which span across a typical data-mining model construction process, from lightweight feature selection, one-pass incremental decision tree induction, and incremental swarm optimization. Each one of these components in SHARP is designed to function together aiming at improving the classification/prediction performance to its best possible. SHARP is scalable, that depends on the available computing resources during runtime, the components can execute in parallel, collectively enhancing different aspects of the overall SHARP process for mining data streams. It is believed that if Big Data are being mined by incrementally learning a data mining model, one pass at a time on the fly, the large volume of such big data is no longer a technical issue, from the perspective of data analytics. Three computer simulation experimentations are shown in this paper, pertaining to three components of SHARP, for demonstrating its efficacy.
  • Keywords
    Big Data; data analysis; data mining; decision trees; optimisation; Big Data; SHARP; data-mining model construction process; incremental learning; incremental swarm optimization; lightweight feature selection; one-pass incremental decision tree induction; scalable data stream mining methodology; stream-based holistic analytics and reasoning in parallel; Accuracy; Big data; Classification algorithms; Data mining; Data models; Decision trees; Integrated circuits; CCV feature selection; Cache-based data stream classifier; Data stream mining methodology; Meta-heusristics;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computational and Business Intelligence (ISCBI), 2014 2nd International Symposium on
  • Print_ISBN
    978-1-4799-7551-8
  • Type

    conf

  • DOI
    10.1109/ISCBI.2014.31
  • Filename
    7119545