• DocumentCode
    2961974
  • Title

    Fast parallel outlier detection for categorical datasets using MapReduce

  • Author

    Koufakou, Anna ; Secretan, Jimmy ; Reeder, John ; Cardona, Kelvin ; Georgiopoulos, Michael

  • Author_Institution
    Sch. of EECS, Univ. of Central Florida, Orlando, FL
  • fYear
    2008
  • fDate
    1-8 June 2008
  • Firstpage
    3298
  • Lastpage
    3304
  • Abstract
    Outlier detection has received considerable attention in many applications, such as detecting network attacks or credit card fraud The massive datasets currently available for mining in some of these outlier detection applications require large parallel systems, and consequently parallelizable outlier detection methods. Most existing outlier detection methods assume that all of the attributes of a dataset are numerical, usually have a quadratic time complexity with respect to the number of points in the dataset, and quite often they require multiple dataset scans. In this paper, we propose a fast parallel outlier detection strategy based on the Attribute Value Frequency (AVF) approach, a high-speed, scalable outlier detection method for categorical data that is inherently easy to parallelize. Our proposed solution, MR-AVF, is based on the MapReduce paradigm for parallel programming, which offers load balancing and fault tolerance. MR-AVF is particularly simple to develop and it is shown to be highly scalable with respect to the number of cluster nodes.
  • Keywords
    computational complexity; fault tolerant computing; parallel programming; resource allocation; secondary ion mass spectra; security of data; MapReduce paradigm; attribute value frequency; categorical datasets; credit card fraud; fast parallel outlier detection; fault tolerance; load balancing; network attacks detection; parallel programming; parallel systems; quadratic time complexity; scalable outlier detection; Breast cancer; Frequency; Network servers; Neural networks;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Neural Networks, 2008. IJCNN 2008. (IEEE World Congress on Computational Intelligence). IEEE International Joint Conference on
  • Conference_Location
    Hong Kong
  • ISSN
    1098-7576
  • Print_ISBN
    978-1-4244-1820-6
  • Electronic_ISBN
    1098-7576
  • Type

    conf

  • DOI
    10.1109/IJCNN.2008.4634266
  • Filename
    4634266