Title :
The Improved Logistic Regression Models for Spam Filtering
Author :
Han, Yong ; Yang, Muyun ; Qi, Haoliang ; He, Xiaoning ; Li, Sheng
Author_Institution :
Comput. Sci. & Technol. Dept., Heilongjiang Inst. of Technol., Harbin, China
Abstract :
The logistic regression model has achieved success in spam filtering. But it is disadvantaged by the equal adjustment of the feature weights appeared in both spam messages and ham ones during training period. This paper presents an improved logistic regression model which reduces the impact of the features appearing in both spam messages and ham ones. Byte level n-grams are employed to extract the features from messages, and TONE (train on or near error) is adopted, which are proved effective in state-of-the-art spam filtering system. The official runs of CEAS (Conference on email and anti-spam) spam-filter Challenge 2008 show that the proposed model is one of the best methods. Our system achieved competitive results in all tasks and is the winner of active learning on the live stream by 1-ROCA.
Keywords :
information filtering; learning (artificial intelligence); logistics; regression analysis; unsolicited e-mail; 1-ROCA; active learning; byte level n-grams; improved logistic regression models; spam messages; state-of-the-art spam filtering system; Ad hoc networks; Buffer storage; Computer science; Delay; Disruption tolerant networking; Interference; Jamming; Logistics; Routing; Unsolicited electronic mail; byte level n-gram; improved logistic regression; online learning; spam filtering;
Conference_Titel :
Asian Language Processing, 2009. IALP '09. International Conference on
Conference_Location :
Singapore
Print_ISBN :
978-0-7695-3904-1
DOI :
10.1109/IALP.2009.74