Title :
Traffic classification-based spam filter
Author :
Zhang, Ni ; Jiang, Yu ; Fang, Binxing ; Cheng, Xueqi ; Guo, Li
Author_Institution :
Software Division, Institute of Computing Technology, Chinese Academy of Sciences, 100080, Beijing, China; Graduate School of Chinese Academy of Sciences, 100039, Beijing, China
Abstract :
We propose an unsupervised spam filter called Bulk Mail Traffic Classification (BMTC) for filtering junk mails from the perspective of ISPs. Our insight is that spammers generally sent mass unsolicited emails with few alterations to a common message content, which can be found at an extensive traffic environment. In our approach, we classify email delivery traffic into different categories by the similarity of message contents. Then we can decide whether or not a particular email category is spam by the number of similar mails of this category and take measures to filter it. We also design a simulator, two sketches data structure, and a series of algorithms to support our method. We have applied BMTC to email traffic data captured at one of the largest commercial Internet service providers in China, and the experimental result indicates that a 70.4% reduction of emails can be achieved with our method. The results also show that BMTC is practical. We can implement it in a high-volume traffic environment handling over millions of mails every day with small memory consumption.
Keywords :
Data structures; Delay; Information filtering; Information filters; Postal services; Protection; Telecommunication traffic; Traffic control; Unsolicited electronic mail; Web and internet services;
Conference_Titel :
Communications, 2006. ICC '06. IEEE International Conference on
Conference_Location :
Istanbul
Print_ISBN :
1-4244-0355-3
Electronic_ISBN :
8164-9547
DOI :
10.1109/ICC.2006.255085