DocumentCode
3592116
Title
A Scalable Data Stream Mining Methodology: Stream-Based Holistic Analytics and Reasoning in Parallel
Author
Fong, Simon ; Yan Zhuang ; Wong, Raymond ; Mohammed, Sabah
Author_Institution
Dept. of Comput. & Inf. Sci., Univ. of Macau, Macau, China
fYear
2014
Firstpage
110
Lastpage
115
Abstract
Big Data though it is a hype up-springing many technical challenges that confront both academic research communities and commercial IT deployment, the root sources of Big Data are founded on data streams. It is generally known that data which are sourced from data streams accumulate continuously making traditional batch-based model induction algorithms infeasible for real-time data mining or high-speed data analytics in a broad sense. In this paper, a novel data stream mining methodology, called Stream-based Holistic Analytics and Reasoning in Parallel (SHARP) is proposed. SHARP is based on principles of incremental learning which span across a typical data-mining model construction process, from lightweight feature selection, one-pass incremental decision tree induction, and incremental swarm optimization. Each one of these components in SHARP is designed to function together aiming at improving the classification/prediction performance to its best possible. SHARP is scalable, that depends on the available computing resources during runtime, the components can execute in parallel, collectively enhancing different aspects of the overall SHARP process for mining data streams. It is believed that if Big Data are being mined by incrementally learning a data mining model, one pass at a time on the fly, the large volume of such big data is no longer a technical issue, from the perspective of data analytics. Three computer simulation experimentations are shown in this paper, pertaining to three components of SHARP, for demonstrating its efficacy.
Keywords
Big Data; data analysis; data mining; decision trees; optimisation; Big Data; SHARP; data-mining model construction process; incremental learning; incremental swarm optimization; lightweight feature selection; one-pass incremental decision tree induction; scalable data stream mining methodology; stream-based holistic analytics and reasoning in parallel; Accuracy; Big data; Classification algorithms; Data mining; Data models; Decision trees; Integrated circuits; CCV feature selection; Cache-based data stream classifier; Data stream mining methodology; Meta-heusristics;
fLanguage
English
Publisher
ieee
Conference_Titel
Computational and Business Intelligence (ISCBI), 2014 2nd International Symposium on
Print_ISBN
978-1-4799-7551-8
Type
conf
DOI
10.1109/ISCBI.2014.31
Filename
7119545
Link To Document