Title :
Centralized content-based Web filtering and blocking: how far can it go?
Author :
Ding, Chen ; Chi, Chi-hung ; Deng, Jing ; Dong, Chun-Lei
Author_Institution :
Sch. of Comput., Nat. Univ. of Singapore, Singapore
fDate :
6/21/1905 12:00:00 AM
Abstract :
To an organisation, centralized Internet filtering and blocking is very important. Educators and parents would like to block offensive materials from children. Companies also want to reduce the amount of work time that employees spend on non-productive Web surfing. Current blocking and filtering mechanisms can roughly be classified into two approaches: URL-based and content filtering. In the URL-based approach, a requested URL address is blocked if a match is found in the blocked list. However, keeping the list up-to-date is very difficult. In the content filtering approach, keyword matching is often used. Its main problem is mis-blocking. Many desirable Web sites are blocked because some predefined keywords appear in their Web pages, though in different meaning or context. There are suggestions for image, audio and video understanding in real-time content filtering. The delay time is also of great concern. In this paper, we investigate how far multimedia content analysis should go for Internet filtering and blocking. A set of guidelines for defining the heuristics used in real-time Web content analysis is also given. These heuristics not only have higher filtering accuracy than most multimedia retrieval techniques do, but they also have a comparable runtime overhead to that of keyword matching. Our experience of deploying a pornographic filtering system in high schools is also described. Experience from the system´s implementation and deployment is found to give a very good direction to the centralized filtering and blocking of Web content
Keywords :
Internet; information analysis; information resources; multimedia systems; real-time systems; Internet filtering; URL address blocking; Web site blocking; World Wide Web; centralized Web filtering; delay time; filtering accuracy; heuristics; high schools; keyword matching; misblocking; multimedia content analysis; multimedia retrieval techniques; nonproductive Web surfing; offensive materials; pornography; real-time content filtering; runtime overhead; Delay effects; Educational institutions; Guidelines; Information filtering; Information filters; Internet; Matched filters; Runtime; Uniform resource locators; Web pages;
Conference_Titel :
Systems, Man, and Cybernetics, 1999. IEEE SMC '99 Conference Proceedings. 1999 IEEE International Conference on
Conference_Location :
Tokyo
Print_ISBN :
0-7803-5731-0
DOI :
10.1109/ICSMC.1999.825218