• DocumentCode
    1633470
  • Title

    Enhance Term Weighting Algorithm as Feature Selection Technique for Illicit Web Content Classification

  • Author

    Lee, Zhi-Sam ; Maarof, Mohd Aizaini ; Selamat, Ali ; Shamsuddin, Siti Mariyam

  • Author_Institution
    Fac. of Comput. Sci. & Inf. Syst., Univ. Teknol. Malaysia, Skudai
  • Volume
    2
  • fYear
    2008
  • Firstpage
    145
  • Lastpage
    150
  • Abstract
    The exponential increase of information in Internet has raise the issue of information security. Pornography Web content is one of the biggest harmful resource that pollute the mind of children and teenagers. Several Web content based analysis approaches had been proposed to avoiding these illicit Web content accessing by the children. However implementation of each solution still remain as an issue. Most of the approaches are weak against classify the high similarity Web content such as pornography and gynecology Web pages. In this study, we try to solve this issue by propose a modified term weighting scheme which used as term feature selection technique for illicit Web page classification. We examine the performance of this proposed technique via three data sets which represent three critical scenarios and compare it with original term weighting scheme. Based on our observation, the proposed technique had shown its superiority for illicit Web pages classification which averagely achieve higher than 90% accuracy rate. Meanwhile the experiment result also denote that the proposed technique had improve from original term weighting scheme. We hope that this study would give other researchers an insight especially who work in the similar area.
  • Keywords
    Internet; security of data; Internet; enhance term weighting algorithm; feature selection technique; gynecology Web pages; illicit Web content classification; information security; pornography Web content; Business; Entropy; Gynaecology; Image analysis; Information filtering; Information filters; Internet; Pollution; Uniform resource locators; Web pages; feature selection; neural network; term weighting scheme; text categorization; web filtering;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Intelligent Systems Design and Applications, 2008. ISDA '08. Eighth International Conference on
  • Conference_Location
    Kaohsiung
  • Print_ISBN
    978-0-7695-3382-7
  • Type

    conf

  • DOI
    10.1109/ISDA.2008.171
  • Filename
    4696322