• DocumentCode
    3095210
  • Title

    iF2: An Interpretable Fuzzy Rule Filter for Web Log Post-Compromised Malicious Activity Monitoring

  • Author

    Chih-Hung Hsieh ; Yu-Siang Shen ; Chao-Wen Li ; Jain-Shing Wu

  • Author_Institution
    Inst. of Informaiton Ind., Taipei, Taiwan
  • fYear
    2015
  • fDate
    24-26 May 2015
  • Firstpage
    130
  • Lastpage
    137
  • Abstract
    To alleviate the loads of tracking web log file by human effort, machine learning methods are now commonly used to analyze log data and to identify the pattern of malicious activities. Traditional kernel based techniques, like the neural network and the support vector machine (SVM), typically can deliver higher prediction accuracy. However, the user of a kernel based techniques normally cannot get an overall picture about the distribution of the data set. On the other hand, logic based techniques, such as the decision tree and the rule-based algorithm, feature the advantage of presenting a good summary about the distinctive characteristics of different classes of data such that they are more suitable to generate interpretable feedbacks to domain experts. In this study, a real web-access log dataset from a certain organization was collected. An efficient interpretable fuzzy rule filter (iF2) was proposed as a filter to analyze the data and to detect suspicious internet addresses from the normal ones. The historical information of each internet address recorded in web log file is summarized as multiple statistics. And the design process of iF2 is elaborately modeled as a parameter optimization problem which simultaneously considers 1) maximizing prediction accuracy, 2) minimizing number of used rules, and 3) minimizing number of selected statistics. Experimental results show that the fuzzy rule filter constructed with the proposed approach is capable of delivering superior prediction accuracy in comparison with the conventional logic based classifiers and the expectation maximization based kernel algorithm. On the other hand, though it cannot match the prediction accuracy delivered by the SVM, however, when facing real web log file where the ratio of positive and negative cases is extremely unbalanced, the proposed iF2 of having optimization flexibility results in a better recall rate and enjoys one major advantage due to providing th- user with an overall picture of the underlying distributions.
  • Keywords
    Internet; data mining; fuzzy set theory; learning (artificial intelligence); neural nets; pattern classification; statistical analysis; support vector machines; Internet address; SVM; Web log file tracking; Web log post-compromised malicious activity monitoring; Web-access log dataset; decision tree; expectation maximization based kernel algorithm; fuzzy rule filter; iF2; interpretable fuzzy rule filter; kernel based techniques; log data analysis; logic based classifiers; logic based techniques; machine learning methods; malicious activities; neural network; parameter optimization problem; recall rate; rule-based algorithm; support vector machine; Accuracy; Internet; Kernel; Monitoring; Optimization; Prediction algorithms; Support vector machines; Fuzzy Rule Based Filter; Machine Learning; Parameter Optimization; Pattern Recognition; Post-Compromised Threat Identification; Web Log Analysis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Security (AsiaJCIS), 2015 10th Asia Joint Conference on
  • Conference_Location
    Kaohsiung
  • Type

    conf

  • DOI
    10.1109/AsiaJCIS.2015.19
  • Filename
    7153947