• DocumentCode
    1811077
  • Title

    Classifying Web pages by content

  • Author

    Smith, Dm ; Harvey, Richard ; Chan, Yi ; Bangham, J. Andrew

  • Author_Institution
    Sch. of Inf. Syst., East Anglia Univ., Norwich, UK
  • fYear
    1999
  • fDate
    1999
  • Firstpage
    42583
  • Lastpage
    42589
  • Abstract
    This paper describes a classification strategy for multimedia documents and reviews the prospects for detecting and filtering documents, such as Web pages, that may be pornographic. We examine several colour filtering algorithms with a view to producing a reliable skin filter. The results, very simple features extracted from an image-only database containing around two-thousand hand-labelled images, are surprisingly good. When the image results are combined with a simple text analysis scheme we are able to achieve a very accurate classification
  • Keywords
    information resources; Web pages classification; automated pornography detector; colour filtering algorithms; document detection; document filtering; feature extraction; image results; image-only database; multimedia documents; reliable skin filter; skin segmentation algorithm; text analysis;
  • fLanguage
    English
  • Publisher
    iet
  • Conference_Titel
    Distributed Imaging (Ref. No. 1999/109), IEE European Workshop
  • Conference_Location
    London
  • Type

    conf

  • DOI
    10.1049/ic:19990619
  • Filename
    831201