DocumentCode :
3165912
Title :
Disk Aware Discord Discovery: Finding Unusual Time Series in Terabyte Sized Datasets
Author :
Yankov, Dragomir ; Keogh, Eamonn ; Rebbapragada, Umaa
Author_Institution :
Univ. of California, Riverside
fYear :
2007
fDate :
28-31 Oct. 2007
Firstpage :
381
Lastpage :
390
Abstract :
The problem of finding unusual time series has recently attracted much attention, and several promising methods are now in the literature. However, virtually all proposed methods assume that the data reside in main memory. For many real-world problems this is not be the case. For example, in astronomy, multi-terabyte time series datasets are the norm. Most current algorithms faced with data which cannot fit in main memory resort to multiple scans of the disk/tape and are thus intractable. In this work we show how one particular definition of unusual time series, the time series discord, can be discovered with a disk aware algorithm. The proposed algorithm is exact and requires only two linear scans of the disk with a tiny buffer of main memory. Furthermore, it is very simple to implement. We use the algorithm to provide further evidence of the effectiveness of the discord definition in areas as diverse as astronomy, Web query mining, video surveillance, etc., and show the efficiency of our method on datasets which are many orders of magnitude larger than anything else attempted in the literature.
Keywords :
buffer storage; data mining; time series; disk aware algorithm; disk aware discord discovery; linear scans; main memory; memory resort; multiple scans; real-world problems; terabyte sized datasets; tiny buffer; unusual time series; Astronomy; Computer science; Data engineering; Data mining; Intrusion detection; Investments; Portfolios; Search engines; USA Councils; Video surveillance;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining, 2007. ICDM 2007. Seventh IEEE International Conference on
Conference_Location :
Omaha, NE
ISSN :
1550-4786
Print_ISBN :
978-0-7695-3018-5
Type :
conf
DOI :
10.1109/ICDM.2007.61
Filename :
4470262
Link To Document :
بازگشت