DocumentCode
3417266
Title
Enhanced content analysis of fraudulent Nigeria electronic mails using e-STAT
Author
Longe, O.B. ; Abayomi-Alli, A. ; Shaib, I. I O ; Longe, F.A.
Author_Institution
Int. Centre For IT & Dev., Southern Univ., Baton Rouge, LA, USA
fYear
2009
fDate
14-16 Jan. 2009
Firstpage
238
Lastpage
243
Abstract
A large percentage of fraudulent spam mails are believed to originate from Nigeria or from Nigerians in remote locations. These mails (popularly referred to as 419 spam) come in broad categories but all with the intent of defrauding the recipients´. Testing the validity of senders and receivers address is one method that has been used to filter spam mails. This approach will not filter out ordinary e-mails since typical e-mail users will always include their true e-mail addresses to facilitate replies. Checking the IP-addresses of 419 mails is a way of ascertaining their actual origin. This can be done with the intention to build a database of e-mail abuse or to blacklist addresses from which fraudulent mails are originating keeping in mind that blacklisted IP addresses could be used to stop the delivery of further mails from such addresses in the future. To this end, this research examines features selected specifically from the content analysis of Nigeria spam e-mail. A domain specific statistical content analysis tool (e-STAT) was developed and implemented using Bayesian statistical technique. The software was tested and trained with a sizeable balanced corpus of Nigerian 419 spam e-mails and normal (ham) e-mails. Analysis of classified mails using e-STAT showed that current concept drift patterns among Nigerian 419 spammers and provided a blacklist of about 2,173 e-mail sender´s addresses, 563 URLs within spam mails and a total of 13,491 bag-of-words common to Nigerian spam e-mails. The research obtained results that will guide future research in the domain of 419 mails in designing effective spam filters and electronic mail classifiers.
Keywords
Bayes methods; information filtering; statistical analysis; unsolicited e-mail; Bayesian statistical technique; e-STAT; e-mail; fraudulent Nigeria electronic mails; fraudulent spam mails; statistical content analysis tool; Bayesian methods; Educational institutions; Electronic mail; Filters; Pattern analysis; Postal services; Software testing; Spatial databases; Statistics; Unsolicited electronic mail; 419; Blacklisting; Classifiers; Filtering; IP address; Nigeria; Spam; Spammers;
fLanguage
English
Publisher
ieee
Conference_Titel
Adaptive Science & Technology, 2009. ICAST 2009. 2nd International Conference on
Conference_Location
Accra
ISSN
0855-8906
Print_ISBN
978-1-4244-3522-7
Electronic_ISBN
0855-8906
Type
conf
DOI
10.1109/ICASTECH.2009.5409717
Filename
5409717
Link To Document