Title of article :
Filtering Spam E-Mail from Mixed Arabic andEnglish Messages: A Comparison of MachineLearning Techniques
Author/Authors :
Alaa El-Halees، نويسنده ,
Issue Information :
روزنامه با شماره پیاپی سال 2009
Abstract :
Spam is one of the main problems in emails communications. As the volume of non-english language spamincreases, little work is done in this area. For example, in Arab world users receive spam written mostly in arabic, english ormixed Arabic and english. To filter this kind of messages, this research applied several machine learning techniques. Manyresearchers have used machine learning techniques to filter spam email messages. This study compared six supervisedmachine learning classifiers which are maximum entropy, decision trees, artificial neural nets, naive bayes, support systemmachines and k-nearest neighbor. The experiments suggested that words in Arabic messages should be stemmed beforeapplying classifier. In addition, in most cases, experiments showed that classifiers using feature selection techniques canachieve comparable or better performance than filters do not used them
Keywords :
Anti-spam filtering , Machine learning techniques , text data mining
Journal title :
The International Arab Journal of Information Technology (IAJIT)
Journal title :
The International Arab Journal of Information Technology (IAJIT)