Title :
Filtering Spam by Using Factors Hyperbolic Tree
Author :
Hou, Hailong ; Chen, Yan ; Beyah, Raheem ; Zhang, Yan-Qing
Author_Institution :
Dept. of Comput. Sci., Georgia State Univ., Atlanta, GA
Abstract :
Most of current anti-spam techniques, like the Bayesian anti-spam algorithm, primarily use lexical matching for filtering unsolicited bulk E-mails (UBE) and unsolicited commercial E-mails (UCE). However, precision of spam filtering is usually low when the lexical matching algorithms are used in real dynamic environments. For example, an E-mail of refrigerator advertisements is useful for most families, but it is useless for Eskimos. The lexical matching anti-spam algorithms cannot distinguish such processed E-mails that are junk to most people but are useful for others. We propose a Factors Hyperbolic Tree (FHT) based algorithm that, unlike the lexical matching algorithms, handles spam filtering in a dynamic environment by considering various relevant factors. The new Ranked Term Frequency (RTF) algorithm is proposed to extract indicators from E-mails that are related to environmental factors. Type-1 and Type-2 fuzzy logic systems are used to evaluate the indicators and determine whether E-mails are spam based on the environmental factors. Additionally, weights of factors in a FHT database are continuously updated according to dynamic conditional factors in a real environment. Simulation results show that the FHT algorithm filters out spam with high precision. Furthermore, the FHT algorithm is more efficient than other methods when it filters E-mails with complex influencing factors. The main contribution of this paper is that the FHT based algorithm can filter E-mails based on influencing factors instead of matched words to allow dynamic filtering of spam E-mails.
Keywords :
fuzzy logic; fuzzy set theory; information filtering; pattern matching; tree data structures; type theory; unsolicited e-mail; factors hyperbolic tree database; lexical matching algorithm; ranked term frequency algorithm; spam filtering; type-2 fuzzy logic system; unsolicited commercial bulk e-mail filtering; Bayesian methods; Databases; Electronic mail; Environmental factors; Filtering algorithms; Frequency; Fuzzy logic; Matched filters; Refrigeration; Unsolicited electronic mail;
Conference_Titel :
Global Telecommunications Conference, 2008. IEEE GLOBECOM 2008. IEEE
Conference_Location :
New Orleans, LO
Print_ISBN :
978-1-4244-2324-8
DOI :
10.1109/GLOCOM.2008.ECP.362