Title :
Data Mining for Malicious Code Detection and Security Applications
Author :
Thuraisingham, Bhavani
Author_Institution :
Dept. of Comput. Sci. Eric Jonsson Sch. of Eng. & Comput. Sci., Univ. of Texas at Dallas, Richardson, TX, USA
Abstract :
Summary form only given. Data mining is the process of posing queries and extracting patterns, often previously unknown from large quantities of data using pattern matching or other reasoning techniques. Data mining has many applications in security including for national security as well as for cyber security. The threats to national security include attacking buildings, destroying critical infrastructures such as power grids and telecommunication systems. Data mining techniques are being investigated to find out who the suspicious people are and who is capable of carrying out terrorist activities. Cyber security is involved with protecting the computer and network systems against corruption due to Trojan horses, worms and viruses. Data mining is also being applied to provide solutions such as intrusion detection and auditing. The first part of the presentation will discuss my joint research with Prof. Latifur Khan and our students at the University of Texas at Dallas on data mining for cyber security applications. For example, anomaly detection techniques could be used to detect unusual patterns and behaviors. Link analysis may be used to trace the viruses to the perpetrators. Classification may be used to group various cyber attacks and then use the profiles to detect an attack when it occurs. Prediction may be used to determine potential future attacks depending in a way on information learned about terrorists through email and phone conversations. Data mining is also being applied for intrusion detection and auditing. Other applications include data mining for malicious code detection such as worm detection and managing firewall policies. This second part of the presentation will discuss the various types of threats to national security and describe data mining techniques for handling such threats. Threats include non real-time threats and real time threats. We need to understand the types of threats and also gather good data to carry out mining and obtain usef- - ul results. The challenge is to reduce false positives and false negatives. The third part of the presentation will discuss some of the research challenges. We need some form of real-time data mining, that is, the results have to be generated in real-time, we also need to build models in real-time for real-time intrusion detection. Data mining is also being applied for credit card fraud detection and biometrics related applications. While some progress has been made on topics such as stream data mining, there is still a lot of work to be done here. Another challenge is to mine multimedia data including surveillance video. Finally, we need to maintain the privacy of individuals. Much research has been carried out on privacy preserving data mining. In summary, the presentation will provide an overview of data mining, the various types of threats and then discuss the applications of data mining for malicious code detection and cyber security. Then we will discuss the consequences to privacy.
Keywords :
authorisation; data mining; invasive software; pattern matching; Dallas; Trojan horses; University of Texas; anomaly detection techniques; auditing; biometrics related applications; computer system protection; credit card fraud detection; cyber attacks; cyber security; data mining techniques; firewall policy management; intrusion detection; malicious code detection; national security; network system protection; pattern extraction; pattern matching; pattern querying; power grids; security applications; telecommunication systems; terrorist activities; worm detection; Computer science; Computer security; Data mining; Educational institutions; Real time systems; Terrorism;
Conference_Titel :
Intelligence and Security Informatics Conference (EISIC), 2011 European
Conference_Location :
Athens
Print_ISBN :
978-1-4577-1464-1
Electronic_ISBN :
978-0-7695-4406-9
DOI :
10.1109/EISIC.2011.80