DocumentCode
3756783
Title
Using Bipartite Anomaly Features for Cyber Security Applications
Author
Eric Goodman;Joe Ingram;Shawn Martin;Dirk Grunwald
Author_Institution
Sandia Nat. Labs., Albuquerque, NM, USA
fYear
2015
Firstpage
301
Lastpage
306
Abstract
In this paper we use anomaly scores derived from a technique for bipartite graphs as features for a supervised machine learning algorithm for two cyber security problems: classifying Short Message Service (SMS) text messages as either spam or non-spam and detecting malicious lateral movement within a network. While disparate problems, both spam and lateral movement detection can be viewed as bipartite graphs and we can compute bipartite anomaly scores for each situation. The bipartite anomaly scores by themselves are not very predictive, but used as auxiliary features can boost the receiver operating characteristic (ROC) curve of a supervised classifier. We examine the UCI SMS Spam Collection Data Set for the SPAM problem and use an authentication graph from Los Alamos National Laboratory. We create features by dimensionality reduction through principal component analysis (PCA) on the message-term or user-computer matrix, and then augment those features with anomaly scores. By using the anomaly scores we are able to improve the area under the curve (AUC) for the receiver operating characteristic (ROC) up to 27.5% for the spam data and 21.4% for the authentication data.
Keywords
"Feature extraction","Bipartite graph","Authentication","Principal component analysis","Supervised learning","Electronic mail"
Publisher
ieee
Conference_Titel
Machine Learning and Applications (ICMLA), 2015 IEEE 14th International Conference on
Type
conf
DOI
10.1109/ICMLA.2015.69
Filename
7424325
Link To Document