• DocumentCode
    571480
  • Title

    Detecting Malicious Websites by Learning IP Address Features

  • Author

    Chiba, Daiki ; Tobe, Kazuhiro ; Mori, Tatsuya ; Goto, Shigeki

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Waseda Univ., Tokyo, Japan
  • fYear
    2012
  • fDate
    16-20 July 2012
  • Firstpage
    29
  • Lastpage
    39
  • Abstract
    Web-based malware attacks have become one of the most serious threats that need to be addressed urgently. Several approaches that have attracted attention as promising ways of detecting such malware include employing various blacklists. However, these conventional approaches often fail to detect new attacks owing to the versatility of malicious websites. Thus, it is difficult to maintain up-to-date blacklists with information regarding new malicious websites. To tackle this problem, we propose a new method for detecting malicious websites using the characteristics of IP addresses. Our approach leverages the empirical observation that IP addresses are more stable than other metrics such as URL and DNS. While the strings that form URLs or domain names are highly variable, IP addresses are less variable, i.e., IPv4 address space is mapped onto 4-bytes strings. We develop a lightweight and scalable detection scheme based on the machine learning technique. The aim of this study is not to provide a single solution that effectively detects web-based malware but to develop a technique that compensates the drawbacks of existing approaches. We validate the effectiveness of our approach by using real IP address data from existing blacklists and real traffic data on a campus network. The results demonstrate that our method can expand the coverage/accuracy of existing blacklists and also detect unknown malicious websites that are not covered by conventional approaches.
  • Keywords
    Web sites; invasive software; learning (artificial intelligence); 4-bytes strings; DNS; IP address features; IPv4 address space; URL; Web-based malware attacks; blacklists; campus network; machine learning technique; malicious Websites detection; traffic data; Browsers; Feature extraction; IP networks; Malware; Support vector machines; Training; Vectors; Blacklist; Drive-by-download; IP address; Machine learning; Web-based malware;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Applications and the Internet (SAINT), 2012 IEEE/IPSJ 12th International Symposium on
  • Conference_Location
    Izmir
  • Print_ISBN
    978-1-4673-2001-6
  • Electronic_ISBN
    978-0-7695-4737-4
  • Type

    conf

  • DOI
    10.1109/SAINT.2012.14
  • Filename
    6305258