• DocumentCode
    1814261
  • Title

    An evaluation on the efficiency of hybrid feature selection in spam email classification

  • Author

    Mohamad, Masurah ; Selamat, Ali

  • Author_Institution
    Software Eng. Res. Group (SERG), Univ. Teknol. Malaysia, Johor Bahru, Malaysia
  • fYear
    2015
  • fDate
    21-23 April 2015
  • Firstpage
    227
  • Lastpage
    231
  • Abstract
    In this paper, a spam filtering technique, which implement a combination of two types of feature selection methods in its classification task will be discussed. Spam, which is also known as unwanted message always floods our electronic mail boxes, despite a spam filtering system provided by the email service provider. In addition, the issue of spam is always highlighted by Internet users and attracts many researchers to conduct research works on fighting the spam. A number of frameworks, algorithms, toolkits, systems and applications have been proposed, developed and applied by researchers and developers to protect us from spam. Several steps need to be considered in the classification task such as data pre-processing, feature selection, feature extraction, training and testing. One of the main processes in the classification task is called feature selection, which is used to reduce the dimensionality of word frequency without affecting the performance of the classification task. In conjunction with that, we had taken the initiative to conduct an experiment to test the efficiency of the proposed Hybrid Feature Selection, which is a combination of Term Frequency Inverse Document Frequency (TFIDF) with the rough set theory in spam email classification problem. The result shows that the proposed Hybrid Feature Selection return a good result.
  • Keywords
    Internet; feature selection; information filtering; pattern classification; security of data; unsolicited e-mail; Internet; TFIDF; data preprocessing; electronic mail boxes; email service provider; feature extraction; hybrid feature selection method; spam email classification problem; spam filtering technique; term frequency inverse document frequency; Accuracy; Filtering; Machine learning algorithms; Set theory; Testing; Unsolicited electronic mail; Spam; TFIDF; algorithm; feature selection; filtering; rough set theory;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer, Communications, and Control Technology (I4CT), 2015 International Conference on
  • Conference_Location
    Kuching
  • Type

    conf

  • DOI
    10.1109/I4CT.2015.7219571
  • Filename
    7219571