DocumentCode
3696964
Title
Distributed Discord Discovery: Spark Based Anomaly Detection in Time Series
Author
Yafei Wu;Yongxin Zhu;Tian Huang;Xinyang Li;Xinyi Liu;Mengyun Liu
Author_Institution
Sch. of Microelectron., Shanghai Jiao Tong Univ., Shanghai, China
fYear
2015
Firstpage
154
Lastpage
159
Abstract
The computational complexity of discord discovery is O(m2), where m is the size of time series. Many promising methods were proposed to resolve this compute-intensive problem. These methods sequentially discover discords on standalone machine. The limited capability of standalone machine in terms of computing and memory capacity hinders these methods in discovering discords from large dataset in reasonable time. In this work, we propose a distributed discord discovery method. Our method is able to combine discord results from different computing nodes, which are non-combinable in previous literature. We mitigate the issue of the memory wall by using distributed data partitioning. We implement our method on distributed Spark computing framework and distributed HDFS (Hadoop Distributed File System) storage platform. The implementation exhibits superior scalability and enables discords discovery in multi-dimension time series. We evaluate our method with terabyte-sized dataset, which is larger than any dataset in previous literature. Evaluation results show that our method has clear advantage in terms of performance and efficiency over state-of-the-art algorithms.
Keywords
"Time series analysis","Algorithm design and analysis","Sparks","Microelectronics","Force","Acceleration","Clustering algorithms"
Publisher
ieee
Conference_Titel
High Performance Computing and Communications (HPCC), 2015 IEEE 7th International Symposium on Cyberspace Safety and Security (CSS), 2015 IEEE 12th International Conferen on Embedded Software and Systems (ICESS), 2015 IEEE 17th International Conference on
Type
conf
DOI
10.1109/HPCC-CSS-ICESS.2015.228
Filename
7336158
Link To Document