DocumentCode :
387530
Title :
Active learning with simplified SVMs for spam categorization
Author :
Kun-Lun Li ; Li, Kun-lun ; Huang, Hou-Kuan ; Tian, Sheng-Feng
Author_Institution :
Sch. of Comput. & Inf. Technol., Northern Jiaotong Univ., Beijing, China
Volume :
3
fYear :
2002
fDate :
2002
Firstpage :
1198
Abstract :
We propose a method for spam categorization based on support vector machines (SVMs) using active learning strategy. We study the use of support vector machines in classifying e-mail as spam or nonspam. But the standard algorithms for training support vector machines generally produce solutions with a greater number of support vectors than strictly necessary. An algorithm is applied in the paper that allows the unnecessary support vectors to be recognized and eliminated. We analyze the particular properties of our special task and identify why SVMs especially the simplified SVMs are appropriate for dealing with spam. Instead of using a randomly selected training set, the learner has access to a pool of unlabeled instances and can request the labels for some number of them. We introduce a new method for choosing which instances to request next.
Keywords :
electronic mail; learning automata; pattern classification; statistical analysis; text analysis; active learning; e-mail; simplified support vector machines; spam categorization; unlabeled instances; Electronic mail; Information technology; Machine learning; Mathematics; Postal services; Risk management; Support vector machine classification; Support vector machines; Unsolicited electronic mail; Virtual colonoscopy;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Machine Learning and Cybernetics, 2002. Proceedings. 2002 International Conference on
Print_ISBN :
0-7803-7508-4
Type :
conf
DOI :
10.1109/ICMLC.2002.1167390
Filename :
1167390
Link To Document :
بازگشت