Performance evaluation of a machine learning algorithm for early application identification

Author

Verticale, Giacomo ; Giacomazzi, Paolo

Author_Institution

Dipt. di Elettron. e Inf., Politec. di Milano, Milan

fYear

2008

fDate

20-22 Oct. 2008

Firstpage

845

Lastpage

849

Abstract

The early identification of applications through the observation and fast analysis of the associated packet flows is a critical building block of intrusion detection and policy enforcement systems. The simple techniques currently used in practice, such as looking at the transport port numbers or at the application payload, are increasingly less effective for new applications using random port numbers and/or encryption. Therefore, there is increasing interest in machine learning techniques capable of identifying applications by examining features of the associated traffic process such as packet lengths and inter-arrival times. However, these techniques require that the classification algorithm is trained with examples of the traffic generated by the applications to be identified, possibly on the link where the the classifier will operate. In this paper we provide two new contributions. First, we apply the C4.5 decision tree algorithm to the problem of early application identification (i.e. looking at the first packets of the flow) and show that it has better performance than the algorithms proposed in the literature. Moreover, we evaluate the performance of the classifier when training is performed on a link different from the link where the classifier operates. This is an important issue, as a pre-trained portable classifier would greatly facilitate the deployment and management of the classification infrastructure.

Keywords

cryptography; decision trees; learning (artificial intelligence); software performance evaluation; telecommunication traffic; C4.5 decision tree algorithm; associated packet flows; associated traffic process; classification algorithm; encryption; intrusion detection; machine learning algorithm; performance evaluation; policy enforcement systems; random port numbers; transport port numbers; Bayesian methods; Classification algorithms; Clustering algorithms; Computer science; Hidden Markov models; Inspection; Machine learning; Machine learning algorithms; Payloads; Peer to peer computing;

fLanguage

English

Publisher

ieee

Conference_Titel

Computer Science and Information Technology, 2008. IMCSIT 2008. International Multiconference on

Conference_Location

Wisia

Print_ISBN

978-83-60810-14-9

Type

conf

DOI

10.1109/IMCSIT.2008.4747340

Filename

4747340