DocumentCode
2961974
Title
Fast parallel outlier detection for categorical datasets using MapReduce
Author
Koufakou, Anna ; Secretan, Jimmy ; Reeder, John ; Cardona, Kelvin ; Georgiopoulos, Michael
Author_Institution
Sch. of EECS, Univ. of Central Florida, Orlando, FL
fYear
2008
fDate
1-8 June 2008
Firstpage
3298
Lastpage
3304
Abstract
Outlier detection has received considerable attention in many applications, such as detecting network attacks or credit card fraud The massive datasets currently available for mining in some of these outlier detection applications require large parallel systems, and consequently parallelizable outlier detection methods. Most existing outlier detection methods assume that all of the attributes of a dataset are numerical, usually have a quadratic time complexity with respect to the number of points in the dataset, and quite often they require multiple dataset scans. In this paper, we propose a fast parallel outlier detection strategy based on the Attribute Value Frequency (AVF) approach, a high-speed, scalable outlier detection method for categorical data that is inherently easy to parallelize. Our proposed solution, MR-AVF, is based on the MapReduce paradigm for parallel programming, which offers load balancing and fault tolerance. MR-AVF is particularly simple to develop and it is shown to be highly scalable with respect to the number of cluster nodes.
Keywords
computational complexity; fault tolerant computing; parallel programming; resource allocation; secondary ion mass spectra; security of data; MapReduce paradigm; attribute value frequency; categorical datasets; credit card fraud; fast parallel outlier detection; fault tolerance; load balancing; network attacks detection; parallel programming; parallel systems; quadratic time complexity; scalable outlier detection; Breast cancer; Frequency; Network servers; Neural networks;
fLanguage
English
Publisher
ieee
Conference_Titel
Neural Networks, 2008. IJCNN 2008. (IEEE World Congress on Computational Intelligence). IEEE International Joint Conference on
Conference_Location
Hong Kong
ISSN
1098-7576
Print_ISBN
978-1-4244-1820-6
Electronic_ISBN
1098-7576
Type
conf
DOI
10.1109/IJCNN.2008.4634266
Filename
4634266
Link To Document