DocumentCode
2227468
Title
WebGuard: Web based adult content detection and filtering system
Author
Hammami, Mohamed ; CHAHIR, Youssef ; Chen, Liming
Author_Institution
Ecole Centrale de Lyon, Ecully, France
fYear
2003
fDate
13-17 Oct. 2003
Firstpage
574
Lastpage
578
Abstract
We describe a Web filtering system "WebGuard", which aims to automatically detect and filter adult content on the Web. WebGuard uses Web crawler to extract relevant data from the Web, combines the textual content, the image content, and the URL name of a Web page to construct its feature vector. WebGuard uses data mining techniques to classify URLs into two classes: suspect URLs and normal URLs. The suspect URLs are stored in a database, which is constantly and automatically updated in order to reflect the highly dynamic evolution of the Web. When working, WebGuard simply captures a user\´s URL, matches it with the suspect URLs stored in the database and takes an appropriate action - filtering or blocking - according to the result of the analysis. Our preliminary results show that it can detect and filter adult content effectively.
Keywords
Internet; client-server systems; data mining; information filters; information retrieval; URL name; Web crawler; Web page; WebGuard filtering system; adult content detection; client-server architecture; data extraction; data mining techniques; image content; normal URL; suspect URL; textual content; Crawlers; Data mining; Image databases; Information analysis; Information filtering; Information filters; Internet; Spatial databases; Uniform resource locators; Web pages;
fLanguage
English
Publisher
ieee
Conference_Titel
Web Intelligence, 2003. WI 2003. Proceedings. IEEE/WIC International Conference on
Print_ISBN
0-7695-1932-6
Type
conf
DOI
10.1109/WI.2003.1241271
Filename
1241271
Link To Document