Detecting Malicious Websites by Learning IP Address Features

Author

Chiba, Daiki ; Tobe, Kazuhiro ; Mori, Tatsuya ; Goto, Shigeki

Author_Institution

Dept. of Comput. Sci. & Eng., Waseda Univ., Tokyo, Japan

fYear

2012

fDate

16-20 July 2012

Firstpage

29

Lastpage

39

Abstract

Web-based malware attacks have become one of the most serious threats that need to be addressed urgently. Several approaches that have attracted attention as promising ways of detecting such malware include employing various blacklists. However, these conventional approaches often fail to detect new attacks owing to the versatility of malicious websites. Thus, it is difficult to maintain up-to-date blacklists with information regarding new malicious websites. To tackle this problem, we propose a new method for detecting malicious websites using the characteristics of IP addresses. Our approach leverages the empirical observation that IP addresses are more stable than other metrics such as URL and DNS. While the strings that form URLs or domain names are highly variable, IP addresses are less variable, i.e., IPv4 address space is mapped onto 4-bytes strings. We develop a lightweight and scalable detection scheme based on the machine learning technique. The aim of this study is not to provide a single solution that effectively detects web-based malware but to develop a technique that compensates the drawbacks of existing approaches. We validate the effectiveness of our approach by using real IP address data from existing blacklists and real traffic data on a campus network. The results demonstrate that our method can expand the coverage/accuracy of existing blacklists and also detect unknown malicious websites that are not covered by conventional approaches.

Keywords

Web sites; invasive software; learning (artificial intelligence); 4-bytes strings; DNS; IP address features; IPv4 address space; URL; Web-based malware attacks; blacklists; campus network; machine learning technique; malicious Websites detection; traffic data; Browsers; Feature extraction; IP networks; Malware; Support vector machines; Training; Vectors; Blacklist; Drive-by-download; IP address; Machine learning; Web-based malware;

fLanguage

English

Publisher

ieee

Conference_Titel

Applications and the Internet (SAINT), 2012 IEEE/IPSJ 12th International Symposium on

Conference_Location

Izmir

Print_ISBN

978-1-4673-2001-6

Electronic_ISBN

978-0-7695-4737-4

Type

conf

DOI

10.1109/SAINT.2012.14

Filename

6305258