مرکز منطقه ای اطلاع رساني علوم و فناوري - Bayesian Chinese Spam Filter Based on Crossed N-gram

DocumentCode :

2801371

Title :

Bayesian Chinese Spam Filter Based on Crossed N-gram

Author :

Dong, Jianshe ; Cao, Haixia ; Liu, Peng ; Ren, Li

Author_Institution :

Lanzou University of Technology, China

Volume :

fYear :

2006

fDate :

Oct. 2006

Firstpage :

103

Lastpage :

108

Abstract :

Naive Bayesian spam email filters are a wellknown and powerful type of filters that can easily be induced from a dataset of sample cases. However, the problem of segmenting words for Chinese email restricts its performance. In this paper, we present a Bayesian Chinese spam filter based on cross N-gram. This method does not need to carry on segmenting words for Chinese emails, so that it can avoid to be restricted by inaccurate words segmenting. It also neednÂ¿t to install segmenting word dictionary and is easy to install in the user terminal to construct an individualized spam filter since the space and time efficiency are improved. The restriction on independence assumption of naive bayes method is relaxed in some degree. The results of experiments show that the proposed method can acquire a high accuracy ratio at low cost.

Keywords :

Bayesian methods; Computer networks; Data engineering; Dictionaries; Educational technology; Grid computing; Information filtering; Information filters; Military computing; Unsolicited electronic mail;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Intelligent Systems Design and Applications, 2006. ISDA '06. Sixth International Conference on

Conference_Location :

Jian, China

Print_ISBN :

0-7695-2528-8

Type :

conf

DOI :

10.1109/ISDA.2006.17

Filename :

4021867

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2801371