DocumentCode :
3438532
Title :
Towards Optimal Symbolization for Time Series Comparisons
Author :
Smith, Graeme ; Goulding, James ; Barrack, Duncan
Author_Institution :
Horizon Digital Econ. Res., Univ. of Nottingham, Nottingham, UK
fYear :
2013
fDate :
7-10 Dec. 2013
Firstpage :
646
Lastpage :
653
Abstract :
The abundance and value of mining large time series data sets has long been acknowledged. Ubiquitous in fields ranging from astronomy, biology and web science the size and number of these datasets continues to increase, a situation exacerbated by the exponential growth of our digital footprints. The prevalence and potential utility of this data has led to a vast number of time-series data mining techniques, many of which require symbolization of the raw time series as a pre-processing step for which a number of well used, pre-existing approaches from the literature are typically employed. In this work we note that these standard approaches are sub-optimal in (at least) the broad application area of time series comparison leading to unnecessary data corruption and potential performance loss before any real data mining takes place. Addressing this we present a novel quantizer based upon optimization of comparison fidelity and a computationally tractable algorithm for its implementation on big datasets. We demonstrate empirically that our new approach provides a statistically significant reduction in the amount of error introduced by the symbolization process compared to current state-of-the-art. The approach therefore provides a more accurate input for the vast number of data mining techniques in the literature, providing the potential of increased real world performance across a wide range of existing data mining algorithms and applications.
Keywords :
data mining; time series; big datasets; comparison fidelity optimization; computationally tractable algorithm; data mining algorithms; optimal symbolization; quantizer; symbolization process; time series comparisons; Approximation methods; Data mining; Equations; Mathematical model; Quantization (signal); Simulated annealing; Time series analysis; optimization; quantization; symbolization; time series;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining Workshops (ICDMW), 2013 IEEE 13th International Conference on
Conference_Location :
Dallas, TX
Print_ISBN :
978-1-4799-3143-9
Type :
conf
DOI :
10.1109/ICDMW.2013.59
Filename :
6753981
Link To Document :
بازگشت