iF2: An Interpretable Fuzzy Rule Filter for Web Log Post-Compromised Malicious Activity Monitoring

Author

Chih-Hung Hsieh ; Yu-Siang Shen ; Chao-Wen Li ; Jain-Shing Wu

Author_Institution

Inst. of Informaiton Ind., Taipei, Taiwan

fYear

2015

fDate

24-26 May 2015

Firstpage

130

Lastpage

137

Abstract

To alleviate the loads of tracking web log file by human effort, machine learning methods are now commonly used to analyze log data and to identify the pattern of malicious activities. Traditional kernel based techniques, like the neural network and the support vector machine (SVM), typically can deliver higher prediction accuracy. However, the user of a kernel based techniques normally cannot get an overall picture about the distribution of the data set. On the other hand, logic based techniques, such as the decision tree and the rule-based algorithm, feature the advantage of presenting a good summary about the distinctive characteristics of different classes of data such that they are more suitable to generate interpretable feedbacks to domain experts. In this study, a real web-access log dataset from a certain organization was collected. An efficient interpretable fuzzy rule filter (iF²) was proposed as a filter to analyze the data and to detect suspicious internet addresses from the normal ones. The historical information of each internet address recorded in web log file is summarized as multiple statistics. And the design process of iF² is elaborately modeled as a parameter optimization problem which simultaneously considers 1) maximizing prediction accuracy, 2) minimizing number of used rules, and 3) minimizing number of selected statistics. Experimental results show that the fuzzy rule filter constructed with the proposed approach is capable of delivering superior prediction accuracy in comparison with the conventional logic based classifiers and the expectation maximization based kernel algorithm. On the other hand, though it cannot match the prediction accuracy delivered by the SVM, however, when facing real web log file where the ratio of positive and negative cases is extremely unbalanced, the proposed iF² of having optimization flexibility results in a better recall rate and enjoys one major advantage due to providing th- user with an overall picture of the underlying distributions.

Keywords

Internet; data mining; fuzzy set theory; learning (artificial intelligence); neural nets; pattern classification; statistical analysis; support vector machines; Internet address; SVM; Web log file tracking; Web log post-compromised malicious activity monitoring; Web-access log dataset; decision tree; expectation maximization based kernel algorithm; fuzzy rule filter; iF²; interpretable fuzzy rule filter; kernel based techniques; log data analysis; logic based classifiers; logic based techniques; machine learning methods; malicious activities; neural network; parameter optimization problem; recall rate; rule-based algorithm; support vector machine; Accuracy; Internet; Kernel; Monitoring; Optimization; Prediction algorithms; Support vector machines; Fuzzy Rule Based Filter; Machine Learning; Parameter Optimization; Pattern Recognition; Post-Compromised Threat Identification; Web Log Analysis;

fLanguage

English

Publisher

ieee

Conference_Titel

Information Security (AsiaJCIS), 2015 10th Asia Joint Conference on

Conference_Location

Kaohsiung

Type

conf

DOI

10.1109/AsiaJCIS.2015.19

Filename

7153947

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=3095210