Title :
Classifying Web pages by content
Author :
Smith, Dm ; Harvey, Richard ; Chan, Yi ; Bangham, J. Andrew
Author_Institution :
Sch. of Inf. Syst., East Anglia Univ., Norwich, UK
Abstract :
This paper describes a classification strategy for multimedia documents and reviews the prospects for detecting and filtering documents, such as Web pages, that may be pornographic. We examine several colour filtering algorithms with a view to producing a reliable skin filter. The results, very simple features extracted from an image-only database containing around two-thousand hand-labelled images, are surprisingly good. When the image results are combined with a simple text analysis scheme we are able to achieve a very accurate classification
Keywords :
information resources; Web pages classification; automated pornography detector; colour filtering algorithms; document detection; document filtering; feature extraction; image results; image-only database; multimedia documents; reliable skin filter; skin segmentation algorithm; text analysis;
Conference_Titel :
Distributed Imaging (Ref. No. 1999/109), IEE European Workshop
Conference_Location :
London
DOI :
10.1049/ic:19990619