• DocumentCode
    2831746
  • Title

    Adaptive spam filtering using dynamic feature space

  • Author

    Zhou, Yan ; Mulekar, Madhuri S. ; Nerellapalli, Praveen

  • Author_Institution
    Sch. of CIS, South Alabama Univ., Mobile, AL
  • fYear
    2005
  • fDate
    16-16 Nov. 2005
  • Lastpage
    309
  • Abstract
    Unsolicited bulk e-mail, also known as spam, has been an increasing problem for the e-mail society. This paper presents a new spam filtering strategy that 1) uses a practical entropy coding technique, Huffman coding, to dynamically encode the feature space of e-mail collections over time and, 2) applies an online algorithm to adaptively enhance the learned spam concept as new e-mail data becomes available. The contributions of this work include a highly efficient spam filtering algorithm in which the input space is radically reduced to a single-dimension input vector, and an adaptive learning technique that is robust to vocabulary change, concept drifting and skewed data distribution. We compare our technique to several existing off-line learning techniques including support vector machine, naive Bayes, k-nearest neighbor, C4.5 decision tree, RBFNetwork, boosted decision tree and stacking, and demonstrate the effectiveness of our technique by presenting the experimental results on the e-mail data that is publicly available
  • Keywords
    Huffman codes; entropy codes; unsolicited e-mail; Huffman coding; adaptive learning; adaptive spam filtering; concept drifting; dynamic feature space; entropy coding; online algorithm; skewed data distribution; unsolicited bulk email; vocabulary change; Adaptive filters; Decision trees; Electronic mail; Entropy coding; Filtering algorithms; Huffman coding; Machine learning; Robustness; Support vector machines; Vocabulary;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Tools with Artificial Intelligence, 2005. ICTAI 05. 17th IEEE International Conference on
  • Conference_Location
    Hong Kong
  • ISSN
    1082-3409
  • Print_ISBN
    0-7695-2488-5
  • Type

    conf

  • DOI
    10.1109/ICTAI.2005.28
  • Filename
    1562953