Author_Institution :
Sch. of Eng. & Inf. Technol., Univ. of New South Wales, Canberra, ACT, Australia
Abstract :
Host-based intrusion detection systems (HIDSs), especially anomaly-based, have received much attention over the past few decades. Over time, however, the existing data sets used for evaluation of a HIDS have lost most of their relevance due to the substantial development of computer systems. To fill this gap, ADFA Linux data set (ADFA-LD) is recently released, which is composed of thousands of system call traces collected from a contemporary Linux local server and expects to be a new benchmark for evaluating a HIDS. In this paper, we perform a preliminary analysis of ADFA-LD, in an attempt to extract useful information for developing new host-based anomaly detection systems (HADSs). In accordance with the general concerns arising from the community, some typical features are analysed particularly against ADFA-LD, such as length, common pattern and frequency. Furthermore, we implement a simple k nearest neighbour (kNN)-based HADS to be evaluated using ADFA-LD. The experimental results show that, although an acceptable performance can be acquired for a few types of attack, there is still a long way to fully understand the complex behaviour resulting from a modern computer system and, finally, realise more intelligent HADSs.