Title :
Web sensitive text filtering by combining semantics and statistics
Author :
Wu, Ou ; Hu, Weiming
Author_Institution :
Nat. Lab. of Pattern Recognition, Chinese Acad. of Sci., Beijing, China
fDate :
30 Oct.-1 Nov. 2005
Abstract :
Web sensitive information is defined as texts, pictures and other forms of information which contain erotic content on Web. How to filter this harmful information attracts researchers´ interests. In order to keep Web content safe, governments have also given great support on the research on this problem. This paper first briefly review recent developments in Web sensitive information filtering then the statistic and semantic features of sensitive texts are analyzed and represented by a CNN-like word net. Finally a novel method which combines semantics and statistics is proposed to filter sensitive text on Web. Experimental results have demonstrated the proposed method´s promising performance.
Keywords :
Internet; information filtering; statistical analysis; text analysis; CNN-like word net; Web sensitive information filtering; erotic Web content; Automation; Government; Information filtering; Information filters; Internet; Laboratories; Pattern recognition; Statistics; Uniform resource locators; Web pages;
Conference_Titel :
Natural Language Processing and Knowledge Engineering, 2005. IEEE NLP-KE '05. Proceedings of 2005 IEEE International Conference on
Print_ISBN :
0-7803-9361-9
DOI :
10.1109/NLPKE.2005.1598819