• DocumentCode
    3044929
  • Title

    Detection of Hazardous Information Based on HTML Elements

  • Author

    Ikeda, Kazushi ; Yanagihara, Tadashi ; Matsumoto, Kazunori ; Takishima, Yasuhiro

  • Author_Institution
    KDDI R&D Labs. Inc., Saitama, Japan
  • fYear
    2010
  • fDate
    1-4 Nov. 2010
  • Firstpage
    1
  • Lastpage
    4
  • Abstract
    In this paper, we propose high-speed, accurate algorithms for detecting hazardous Web pages. Our algorithms automatically choose strings that appear especially in HTML elements of hazardous Web pages. We use these strings in combination as features of SVMs (support vector machines), and detect hazardous Web pages. Since our algorithms do not rely on the text parts of Web pages, they can detect Web pages that existing text-based algorithms have difficulty in detecting. By conducting a large-scale performance evaluation with real hazardous Web pages, we showed that the hybrid algorithms of our algorithms and existing text-based algorithms increase the precision of existing text-based algorithms alone by 9.3%.
  • Keywords
    Web sites; hypermedia markup languages; security of data; support vector machines; text analysis; HTML element; hazardous Web page detection; hazardous information detection; support vector machines; text-based algorithm; Algorithm design and analysis; Classification algorithms; Feature extraction; Filtering; HTML; Training; Web pages;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computing and Communication Technologies, Research, Innovation, and Vision for the Future (RIVF), 2010 IEEE RIVF International Conference on
  • Conference_Location
    Hanoi
  • Print_ISBN
    978-1-4244-8074-6
  • Type

    conf

  • DOI
    10.1109/RIVF.2010.5633302
  • Filename
    5633302