• DocumentCode
    2801371
  • Title

    Bayesian Chinese Spam Filter Based on Crossed N-gram

  • Author

    Dong, Jianshe ; Cao, Haixia ; Liu, Peng ; Ren, Li

  • Author_Institution
    Lanzou University of Technology, China
  • Volume
    3
  • fYear
    2006
  • fDate
    Oct. 2006
  • Firstpage
    103
  • Lastpage
    108
  • Abstract
    Naive Bayesian spam email filters are a wellknown and powerful type of filters that can easily be induced from a dataset of sample cases. However, the problem of segmenting words for Chinese email restricts its performance. In this paper, we present a Bayesian Chinese spam filter based on cross N-gram. This method does not need to carry on segmenting words for Chinese emails, so that it can avoid to be restricted by inaccurate words segmenting. It also needn¿t to install segmenting word dictionary and is easy to install in the user terminal to construct an individualized spam filter since the space and time efficiency are improved. The restriction on independence assumption of naive bayes method is relaxed in some degree. The results of experiments show that the proposed method can acquire a high accuracy ratio at low cost.
  • Keywords
    Bayesian methods; Computer networks; Data engineering; Dictionaries; Educational technology; Grid computing; Information filtering; Information filters; Military computing; Unsolicited electronic mail;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Intelligent Systems Design and Applications, 2006. ISDA '06. Sixth International Conference on
  • Conference_Location
    Jian, China
  • Print_ISBN
    0-7695-2528-8
  • Type

    conf

  • DOI
    10.1109/ISDA.2006.17
  • Filename
    4021867