• DocumentCode
    3101256
  • Title

    The Improved Logistic Regression Models for Spam Filtering

  • Author

    Han, Yong ; Yang, Muyun ; Qi, Haoliang ; He, Xiaoning ; Li, Sheng

  • Author_Institution
    Comput. Sci. & Technol. Dept., Heilongjiang Inst. of Technol., Harbin, China
  • fYear
    2009
  • fDate
    7-9 Dec. 2009
  • Firstpage
    314
  • Lastpage
    317
  • Abstract
    The logistic regression model has achieved success in spam filtering. But it is disadvantaged by the equal adjustment of the feature weights appeared in both spam messages and ham ones during training period. This paper presents an improved logistic regression model which reduces the impact of the features appearing in both spam messages and ham ones. Byte level n-grams are employed to extract the features from messages, and TONE (train on or near error) is adopted, which are proved effective in state-of-the-art spam filtering system. The official runs of CEAS (Conference on email and anti-spam) spam-filter Challenge 2008 show that the proposed model is one of the best methods. Our system achieved competitive results in all tasks and is the winner of active learning on the live stream by 1-ROCA.
  • Keywords
    information filtering; learning (artificial intelligence); logistics; regression analysis; unsolicited e-mail; 1-ROCA; active learning; byte level n-grams; improved logistic regression models; spam messages; state-of-the-art spam filtering system; Ad hoc networks; Buffer storage; Computer science; Delay; Disruption tolerant networking; Interference; Jamming; Logistics; Routing; Unsolicited electronic mail; byte level n-gram; improved logistic regression; online learning; spam filtering;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Asian Language Processing, 2009. IALP '09. International Conference on
  • Conference_Location
    Singapore
  • Print_ISBN
    978-0-7695-3904-1
  • Type

    conf

  • DOI
    10.1109/IALP.2009.74
  • Filename
    5380724