DocumentCode :
539337
Title :
An experimental comparison of decision trees in traditional data mining and data stream mining
Author :
Hang, Yang ; Fong, Simon
Author_Institution :
Fac. of Sci. & Technol., Univ. of Macau, Macau, China
fYear :
2010
fDate :
Nov. 30 2010-Dec. 2 2010
Firstpage :
442
Lastpage :
447
Abstract :
Data Stream mining (DSM) is claimed to be the successor of traditional data mining where it is capable of mining continuous incoming data streams in real-time with an acceptable performance. Nowadays many computer applications evolved to online and on-demand basis, fresh data are feeding in at high speeds. Not only a decision response needs to be made rapidly, the trained decision tree models would have to be updated recurrently as frequent as the latest data arrive. By the nature of traditional data mining, training datasets are assumed structured and static, and the decision tree models are either refreshed in batches or never. That is, the full dataset will be completely scanned (sometimes in multiple repetitions), induction of rules by Greedy algorithm that proceeds in manner of divide-and-conquer in the case of constructing a C4.5 decision tree. DSM on the other hand progressively builds and renews the decision tree model at a time when a new pass of data come by. In this paper, we evaluated the performance of a popular decision tree in DSM, which is known as Hoeffding Tree vis-à-vis that of C4.5. A good mix of types of datasets was used in the experiments for investigating the apparent differences between the decision trees. An open-source DSM simulator was programmed in JAVA for the experiments.
Keywords :
Java; data mining; decision trees; divide and conquer methods; greedy algorithms; public domain software; Hoeffding tree; JAVA programming; data stream mining; decision response; decision tree model; divide-and-conquer algorithm; greedy algorithm; open source DSM simulator; Accuracy; Computational modeling; Data mining; Data models; Decision trees; Noise; Portable media players; Hoeffding tree algorithm; JAVA simulator; data stream mining; decision tree; noise data;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Advanced Information Management and Service (IMS), 2010 6th International Conference on
Conference_Location :
Seoul
Print_ISBN :
978-1-4244-8599-4
Electronic_ISBN :
978-89-88678-32-9
Type :
conf
Filename :
5713491
Link To Document :
بازگشت