• DocumentCode
    3045137
  • Title

    Genetic-based feature selection for spam detection

  • Author

    Arani, Seyyed Hossein Seyyedi ; Mozaffari, Saeed

  • Author_Institution
    Qazvin Branch, Islamic Azad Univ., Qazvin, Iran
  • fYear
    2013
  • fDate
    14-16 May 2013
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    In recent years, email has evolved into a pervasive and economical means of communication, but spam as an annoying fact has decreased usefulness of this means. For encountering the challenge, email filtering as a special kind of text classification emerged and developed. A main problem in text classification tasks which is more serious in email filtering is existence of large number of features. For solving the issue, various feature selection methods are considered, which extract a lower dimensional feature space from original one and offer it as input to classifier. In this regard, we examined effectiveness of two existent individual methods and offer a new combinational method. The methods, which are experimented individually, are Information Gain (IG) and χ2 statistic (CHI), and our combined method is applying genetic algorithm (GA) on the top features selected by IG. We used Perceptron neural network as classifier. For evaluation of our system, experiments were conducted on PU data set. The results showed that the individual methods are very effective in reducing dimensionality of input space along with increasing performance of classifier, and the combined method further improves performance in spite of bringing dimensionality to a lower extent.
  • Keywords
    genetic algorithms; multilayer perceptrons; pattern classification; statistical analysis; text analysis; unsolicited e-mail; χ2 statistic; CHI; GA; IG; combinational method; economical communication means; email; genetic algorithm; genetic-based feature selection; information gain; lower dimensional feature space; perceptron neural network; pervasive communication means; spam detection; text classification; Biological cells; Electronic mail; Feature extraction; Filtering; Genetic algorithms; Neural networks; Text categorization; feature selection; genetic algorithm; spam detection; text classification;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Electrical Engineering (ICEE), 2013 21st Iranian Conference on
  • Conference_Location
    Mashhad
  • Type

    conf

  • DOI
    10.1109/IranianCEE.2013.6599551
  • Filename
    6599551