Title :
Stream mining over fluctuating network traffic at variable data rates
Author :
Hang, Yang ; Fong, Simon
Author_Institution :
Fac. of Sci. & Technol., Univ. of Macau, Macau, China
fDate :
Nov. 30 2010-Dec. 2 2010
Abstract :
Data stream mining algorithm, such as the popular classifier implemented by Hoeffding tree algorithm (HTA) is acclaimed to be able to handle high speed data streams that potentially amounts to infinity. It emerges as a hot research area recently on applying HTA in different applications that require real-time responses and fast decision making. In particular, we discovered the effect of Internet traffic on Hoeffding bound (HB) which is one of the key performance indicators in HTA stream mining is related to fluctuation. The error of HB oscillates with the fluctuation of data rate in real-time data stream which causes frequent HTA tree reconstruction, and in turn that has an adverse effect on the overall prediction accuracy. From the experiment in this paper, we observe that the HB is related to HTA´s accuracy. And data streams extracted from Internet traffic exhibit fluctuations of highly variable data rates, they influence significantly on HB value. A simple and effective mechanism without the need of arbitrating or intervening with the traffic data rates is proposed in this paper for smoothing the HB fluctuation. From our simulation, the results show that the HB fluctuation is smoothed, and the accuracy in HTA is stabilized. It is believed that the proposed technique can subside the problem of stream mining in network environment where traffic is fluctuating.
Keywords :
Internet; data mining; decision making; real-time systems; telecommunication traffic; trees (mathematics); HB fluctuation; HTA stream mining; HTA tree reconstruction; Hoeffding bound; Hoeffding tree algorithm; Internet traffic; data rate fluctuation; data stream mining algorithm; data streams extraction; fast decision making; fluctuating network traffic; high speed data streams; key performance indicators; network environment; overall prediction accuracy; real-time data stream; real-time responses; traffic data rates; variable data rates; Accuracy; Data mining; Decision trees; Error analysis; Fluctuations; Internet; Real time systems; Hoeffding Tree Algorithm; Internet traffic; component; real time application; real time constraint; stream mining;
Conference_Titel :
Advanced Information Management and Service (IMS), 2010 6th International Conference on
Conference_Location :
Seoul
Print_ISBN :
978-1-4244-8599-4
Electronic_ISBN :
978-89-88678-32-9