• DocumentCode
    1674890
  • Title

    Filtering Spam by Using Factors Hyperbolic Tree

  • Author

    Hou, Hailong ; Chen, Yan ; Beyah, Raheem ; Zhang, Yan-Qing

  • Author_Institution
    Dept. of Comput. Sci., Georgia State Univ., Atlanta, GA
  • fYear
    2008
  • Firstpage
    1
  • Lastpage
    5
  • Abstract
    Most of current anti-spam techniques, like the Bayesian anti-spam algorithm, primarily use lexical matching for filtering unsolicited bulk E-mails (UBE) and unsolicited commercial E-mails (UCE). However, precision of spam filtering is usually low when the lexical matching algorithms are used in real dynamic environments. For example, an E-mail of refrigerator advertisements is useful for most families, but it is useless for Eskimos. The lexical matching anti-spam algorithms cannot distinguish such processed E-mails that are junk to most people but are useful for others. We propose a Factors Hyperbolic Tree (FHT) based algorithm that, unlike the lexical matching algorithms, handles spam filtering in a dynamic environment by considering various relevant factors. The new Ranked Term Frequency (RTF) algorithm is proposed to extract indicators from E-mails that are related to environmental factors. Type-1 and Type-2 fuzzy logic systems are used to evaluate the indicators and determine whether E-mails are spam based on the environmental factors. Additionally, weights of factors in a FHT database are continuously updated according to dynamic conditional factors in a real environment. Simulation results show that the FHT algorithm filters out spam with high precision. Furthermore, the FHT algorithm is more efficient than other methods when it filters E-mails with complex influencing factors. The main contribution of this paper is that the FHT based algorithm can filter E-mails based on influencing factors instead of matched words to allow dynamic filtering of spam E-mails.
  • Keywords
    fuzzy logic; fuzzy set theory; information filtering; pattern matching; tree data structures; type theory; unsolicited e-mail; factors hyperbolic tree database; lexical matching algorithm; ranked term frequency algorithm; spam filtering; type-2 fuzzy logic system; unsolicited commercial bulk e-mail filtering; Bayesian methods; Databases; Electronic mail; Environmental factors; Filtering algorithms; Frequency; Fuzzy logic; Matched filters; Refrigeration; Unsolicited electronic mail;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Global Telecommunications Conference, 2008. IEEE GLOBECOM 2008. IEEE
  • Conference_Location
    New Orleans, LO
  • ISSN
    1930-529X
  • Print_ISBN
    978-1-4244-2324-8
  • Type

    conf

  • DOI
    10.1109/GLOCOM.2008.ECP.362
  • Filename
    4698137