Title :
Active learning with simplified SVMs for spam categorization
Author :
Kun-Lun Li ; Li, Kun-lun ; Huang, Hou-Kuan ; Tian, Sheng-Feng
Author_Institution :
Sch. of Comput. & Inf. Technol., Northern Jiaotong Univ., Beijing, China
Abstract :
We propose a method for spam categorization based on support vector machines (SVMs) using active learning strategy. We study the use of support vector machines in classifying e-mail as spam or nonspam. But the standard algorithms for training support vector machines generally produce solutions with a greater number of support vectors than strictly necessary. An algorithm is applied in the paper that allows the unnecessary support vectors to be recognized and eliminated. We analyze the particular properties of our special task and identify why SVMs especially the simplified SVMs are appropriate for dealing with spam. Instead of using a randomly selected training set, the learner has access to a pool of unlabeled instances and can request the labels for some number of them. We introduce a new method for choosing which instances to request next.
Keywords :
electronic mail; learning automata; pattern classification; statistical analysis; text analysis; active learning; e-mail; simplified support vector machines; spam categorization; unlabeled instances; Electronic mail; Information technology; Machine learning; Mathematics; Postal services; Risk management; Support vector machine classification; Support vector machines; Unsolicited electronic mail; Virtual colonoscopy;
Conference_Titel :
Machine Learning and Cybernetics, 2002. Proceedings. 2002 International Conference on
Print_ISBN :
0-7803-7508-4
DOI :
10.1109/ICMLC.2002.1167390