Title :
Exploring Data Streams with Nonparametric Estimators
Author :
Heinz, Christoph ; Seeger, Bernhard
Author_Institution :
Dept. of Math. & Comput. Sci., Phillips Univ. Marburg
Abstract :
A variety of real-world applications requires a meaningful online analysis of transient data streams. An important building block of many analysis tasks is the characterization of the underlying data distribution. Sophisticated techniques from the area of nonparametric statistics provide a well-defined estimation of continuous data distributions. The analysis of data streams may gain advantage of these techniques, however, the rigid processing requirements of streams render a direct application impossible. In our work, we tackle the adaptation of nonparametric techniques to streaming data. We concentrate on density estimation as it provides a convenient basis for the exploration of an unknown continuous data distribution. Specifically, we have developed kerneland wavelet-based density estimators for data streams in compliance with their processing requirements. Both techniques are incorporated into PIPES, our Java library for advanced data stream processing and analysis. In the demonstration, we present our nonparametric density estimators over data streams and show their performance for a variety of heterogeneous data streams from different real-world application scenarios. We also present the implementation of further analysis tasks on top of our estimators by means of illustrative use cases
Keywords :
Java; data mining; data visualisation; graphical user interfaces; nonparametric statistics; software libraries; wavelet transforms; Java library; PIPES; continuous data distributions; data mining; data stream processing; data visualisation; graphical user interfaces; kernel-based density estimators; nonparametric density estimators; nonparametric statistics; wavelet-based density estimators; Computer science; Data analysis; Data visualization; Graphical user interfaces; Heart rate; Libraries; Mathematics; Statistical analysis; Statistical distributions; Transient analysis;
Conference_Titel :
Scientific and Statistical Database Management, 2006. 18th International Conference on
Conference_Location :
Vienna
Print_ISBN :
0-7695-2590-3
DOI :
10.1109/SSDBM.2006.25