• DocumentCode
    2312786
  • Title

    A Survey on Text Classification Techniques for E-mail Filtering

  • Author

    Upasana ; Chakravarty, S.

  • Author_Institution
    Div. of Comput. Eng., Netaji Subhas Inst. of Technol., New Delhi, India
  • fYear
    2010
  • fDate
    9-11 Feb. 2010
  • Firstpage
    32
  • Lastpage
    36
  • Abstract
    The continuing explosive growth of textual content within the World Wide Web has given rise to the need for sophisticated Text Classification (TC) techniques that combine efficiency with high quality of results. E-mail filtering is one application that has the potential to affect every user of the internet. Even though a large body of research has delved into this problem, there is a paucity of survey that indicates trends and directions. This paper attempts to categorize the prevalent popular techniques for classifying email as spam or legitimate and suggest possible techniques to fill in the lacunae. Our findings suggest that context-based email filtering has the most potential in improving quality by learning various contexts such as n-gram phrases, linguistic constructs or users´ profile based context to tailor his/her filtering scheme.
  • Keywords
    Internet; classification; information filtering; text analysis; unsolicited e-mail; Internet; World Wide Web; context-based email filtering; email classification; legitimate email; linguistic construct; n-gram phrase; spam email; text classification; textual content; user profile; Application software; Bayesian methods; Electronic mail; Explosives; Information filtering; Information filters; Internet; Machine learning; Text categorization; Web sites;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning and Computing (ICMLC), 2010 Second International Conference on
  • Conference_Location
    Bangalore
  • Print_ISBN
    978-1-4244-6006-9
  • Electronic_ISBN
    978-1-4244-6007-6
  • Type

    conf

  • DOI
    10.1109/ICMLC.2010.61
  • Filename
    5460695