Title :
Detecting anomalous latent classes in a batch of network traffic flows
Author :
Kocak, Fatih ; Miller, David J. ; Kesidis, George
Author_Institution :
EE & CSE Depts, Penn State Univ., University Park, PA, USA
Abstract :
We focus on detecting samples from anomalous latent classes, “buried” within a collected batch of known (“normal”) class samples. In our setting, the number of features for each sample is high. We posit and observe to be true that careful “feature selection” within unsupervised anomaly detection may be needed to achieve the most accurate results. Our approach effectively selects features (tests), even though there are no labeled anomalous examples available to form a basis for standard (supervised) feature selection. We form pairwise feature tests based on bivariate Gaussian mixture null models, with one test for every pair of features. The mixtures are estimated using known class samples (null “training set”). Then, we obtain p-values on the test batch samples under the null hypothesis. Subsequently, we calculate approximate joint p-values for candidate anomalous clusters, defined by (sample subset, test subset) pairs. Our approach sequentially detects the most significant clusters of samples in a networking context. We compare our “p-value clustering algorithm”, using ROC curves, with alternative p-value based methods and with the one-class SVM. All the competing methods make sample-wise detections, i.e. they do not jointly detect anomalous clusters. The anomalous class was either an HTTP bot (Zeus) or peer-to-peer (P2P) traffic. Our p-value clustering approach gives promising results for detecting the Zeus bot and P2P traffic amongst Web.
Keywords :
peer-to-peer computing; telecommunication security; telecommunication traffic; HTTP; P2P traffic; ROC curves; Web; Zeus bot; anomalous clusters; anomalous latent classes detection; bivariate Gaussian mixture; feature selection; network traffic flows; networking context; one-class SVM; p-value based methods; p-value clustering algorithm; pairwise feature tests; peer-to-peer traffic; sample-wise detections; standard feature selection; unsupervised anomaly detection; Clustering algorithms; Feature extraction; Joints; Peer-to-peer computing; Support vector machines; Training; Vectors; anomaly detection; clustering; feature selection; intrusion detection; mixture models; one-class SVM; p-value;
Conference_Titel :
Information Sciences and Systems (CISS), 2014 48th Annual Conference on
Conference_Location :
Princeton, NJ
DOI :
10.1109/CISS.2014.6814181