Using Bipartite Anomaly Features for Cyber Security Applications

Author

Eric Goodman;Joe Ingram;Shawn Martin;Dirk Grunwald

Author_Institution

Sandia Nat. Labs., Albuquerque, NM, USA

fYear

2015

Firstpage

301

Lastpage

306

Abstract

In this paper we use anomaly scores derived from a technique for bipartite graphs as features for a supervised machine learning algorithm for two cyber security problems: classifying Short Message Service (SMS) text messages as either spam or non-spam and detecting malicious lateral movement within a network. While disparate problems, both spam and lateral movement detection can be viewed as bipartite graphs and we can compute bipartite anomaly scores for each situation. The bipartite anomaly scores by themselves are not very predictive, but used as auxiliary features can boost the receiver operating characteristic (ROC) curve of a supervised classifier. We examine the UCI SMS Spam Collection Data Set for the SPAM problem and use an authentication graph from Los Alamos National Laboratory. We create features by dimensionality reduction through principal component analysis (PCA) on the message-term or user-computer matrix, and then augment those features with anomaly scores. By using the anomaly scores we are able to improve the area under the curve (AUC) for the receiver operating characteristic (ROC) up to 27.5% for the spam data and 21.4% for the authentication data.

Keywords

"Feature extraction","Bipartite graph","Authentication","Principal component analysis","Supervised learning","Electronic mail"

Publisher

ieee

Conference_Titel

Machine Learning and Applications (ICMLA), 2015 IEEE 14th International Conference on

Type

conf

DOI

10.1109/ICMLA.2015.69

Filename

7424325