Title :
Identifying junk electronic mail in Microsoft outlook with a support vector machine
Author :
Woitaszek, Matthew ; Shaaban, Muhammad ; Czernikowski, Roy
Author_Institution :
Colorado Univ., Boulder, CO, USA
Abstract :
In this paper, we utilize a simple support vector machine to identify commercial electronic mail. The use of a personalized dictionary for model training provided a classification accuracy of 96.69%, while a much larger system dictionary achieved 95.26%. The classification system was subsequently implemented as an add-in for Microsoft Outlook XP, providing sorting and grouping capabilities using Outlook´s interface to the typical desktop e-mail user.
Keywords :
document handling; electronic mail; Microsoft Outlook XP; classification accuracy; commercial electronic mail; desktop e-mail user; grouping capabilities; model training; personalized dictionary; simple support vector machine; sorting capabilities; spam; unsolicited commercial electronic mail; Dictionaries; Electronic mail; Filtering; Internet; Large-scale systems; Privacy; Sorting; Support vector machine classification; Support vector machines; Text categorization;
Conference_Titel :
Applications and the Internet, 2003. Proceedings. 2003 Symposium on
Print_ISBN :
0-7695-1872-9
DOI :
10.1109/SAINT.2003.1183045