DocumentCode :
2801371
Title :
Bayesian Chinese Spam Filter Based on Crossed N-gram
Author :
Dong, Jianshe ; Cao, Haixia ; Liu, Peng ; Ren, Li
Author_Institution :
Lanzou University of Technology, China
Volume :
3
fYear :
2006
fDate :
Oct. 2006
Firstpage :
103
Lastpage :
108
Abstract :
Naive Bayesian spam email filters are a wellknown and powerful type of filters that can easily be induced from a dataset of sample cases. However, the problem of segmenting words for Chinese email restricts its performance. In this paper, we present a Bayesian Chinese spam filter based on cross N-gram. This method does not need to carry on segmenting words for Chinese emails, so that it can avoid to be restricted by inaccurate words segmenting. It also needn¿t to install segmenting word dictionary and is easy to install in the user terminal to construct an individualized spam filter since the space and time efficiency are improved. The restriction on independence assumption of naive bayes method is relaxed in some degree. The results of experiments show that the proposed method can acquire a high accuracy ratio at low cost.
Keywords :
Bayesian methods; Computer networks; Data engineering; Dictionaries; Educational technology; Grid computing; Information filtering; Information filters; Military computing; Unsolicited electronic mail;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Intelligent Systems Design and Applications, 2006. ISDA '06. Sixth International Conference on
Conference_Location :
Jian, China
Print_ISBN :
0-7695-2528-8
Type :
conf
DOI :
10.1109/ISDA.2006.17
Filename :
4021867
Link To Document :
بازگشت