Title :
Performance evaluation of a machine learning algorithm for early application identification
Author :
Verticale, Giacomo ; Giacomazzi, Paolo
Author_Institution :
Dipt. di Elettron. e Inf., Politec. di Milano, Milan
Abstract :
The early identification of applications through the observation and fast analysis of the associated packet flows is a critical building block of intrusion detection and policy enforcement systems. The simple techniques currently used in practice, such as looking at the transport port numbers or at the application payload, are increasingly less effective for new applications using random port numbers and/or encryption. Therefore, there is increasing interest in machine learning techniques capable of identifying applications by examining features of the associated traffic process such as packet lengths and inter-arrival times. However, these techniques require that the classification algorithm is trained with examples of the traffic generated by the applications to be identified, possibly on the link where the the classifier will operate. In this paper we provide two new contributions. First, we apply the C4.5 decision tree algorithm to the problem of early application identification (i.e. looking at the first packets of the flow) and show that it has better performance than the algorithms proposed in the literature. Moreover, we evaluate the performance of the classifier when training is performed on a link different from the link where the classifier operates. This is an important issue, as a pre-trained portable classifier would greatly facilitate the deployment and management of the classification infrastructure.
Keywords :
cryptography; decision trees; learning (artificial intelligence); software performance evaluation; telecommunication traffic; C4.5 decision tree algorithm; associated packet flows; associated traffic process; classification algorithm; encryption; intrusion detection; machine learning algorithm; performance evaluation; policy enforcement systems; random port numbers; transport port numbers; Bayesian methods; Classification algorithms; Clustering algorithms; Computer science; Hidden Markov models; Inspection; Machine learning; Machine learning algorithms; Payloads; Peer to peer computing;
Conference_Titel :
Computer Science and Information Technology, 2008. IMCSIT 2008. International Multiconference on
Conference_Location :
Wisia
Print_ISBN :
978-83-60810-14-9
DOI :
10.1109/IMCSIT.2008.4747340