• DocumentCode
    3776996
  • Title

    Automatic protocol feature word construction based on machine learning

  • Author

    Haifeng Li; Bin Zhang; Bo Shuai; Jian Wang; Chaojing Tang

  • Author_Institution
    School of Electronic Science and Engineering, National University of Defense Technology, Changsha, Hunan, China, 410073
  • fYear
    2015
  • Firstpage
    93
  • Lastpage
    97
  • Abstract
    Automatic protocol reverse engineering for application protocol is becoming more and more important for many applications such as application protocol analyzer, penetration testing, intrusion prevention and detection. Unfortunately, many techniques for extracting the protocol message format specifications of unknown applications often have some limitations for few priori information or the time-consuming problem. Protocol feature words are byte subsequences within traffic payload that could help distinguish application protocols. In this paper, a new approach is proposed for extracting the protocol message format specifications of unknown applications which is based on the Latent Dirichlet Allocation (LDA) model and Huffman Tree Support Vector Machine (HT-SVM). Firstly, the key words are extracted by utilizing the LDA model, which is a kind of machine learning in document library to extract the theme structure named topic. Secondly, the HT-SVM method is applied to constructing the feature words based on the above process. The proposed approach is implemented and evaluated to infer message format specifications of SMTP binary protocol. Experimental results show that the approach accurately parses and infers SMTP protocol with highly recall rate.
  • Keywords
    "Protocols","Artificial neural networks","Support vector machines"
  • Publisher
    ieee
  • Conference_Titel
    Progress in Informatics and Computing (PIC), 2015 IEEE International Conference on
  • Print_ISBN
    978-1-4673-8086-7
  • Type

    conf

  • DOI
    10.1109/PIC.2015.7489816
  • Filename
    7489816