DocumentCode :
3245603
Title :
Automatic junk e-mail filtering based on latent content
Author :
Bellegarda, Jerome R. ; Naik, Devang ; Silverman, Kim E A
Author_Institution :
Spoken Language Group, Apple Comput. Inc., Cupertino, CA, USA
fYear :
2003
fDate :
30 Nov.-3 Dec. 2003
Firstpage :
465
Lastpage :
470
Abstract :
The explosion in unsolicited mass electronic mail (junk e-mail) over the past decade has sparked interest in automatic filtering solutions. Traditional techniques tend to rely on header analysis, keyword/keyphrase matching and analogous rule-based predicates, and/or some probabilistic model of text generation. This paper aims instead at deciding whether or not the latent subject matter is consistent with the user´s interests. The underlying framework is latent semantic analysis: each e-mail is automatically classified against two semantic anchors, one for legitimate and one for junk messages. Experiments show that this approach is competitive with the state-of-the-art in e-mail classification, and potentially advantageous in real-world applications with high junk-to-legitimate ratios. The resulting technology has been successfully released in August 2002 as part of the e-mail client bundled with the MacOS 10.2 operating system.
Keywords :
classification; text analysis; unsolicited e-mail; automatic junk e-mail filtering; e-mail classification; e-mail client; e-mail latent content; header analysis; junk-to-legitimate ratio; keyword/keyphrase matching; latent semantic analysis; rule-based predicates; semantic anchors; unsolicited mass electronic mail; Business; Costs; Databases; Electronic mail; Explosions; Filtering; Internet; Natural languages; Operating systems; Unsolicited electronic mail;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Automatic Speech Recognition and Understanding, 2003. ASRU '03. 2003 IEEE Workshop on
Print_ISBN :
0-7803-7980-2
Type :
conf
DOI :
10.1109/ASRU.2003.1318485
Filename :
1318485
Link To Document :
بازگشت