Title :
A hybrid approach for spam filtering using local concentration based K-Means clustering
Author :
Jain, Kunal ; Agrawal, Sanjay
Author_Institution :
Dept. of CEA, Nat. Inst. of Tech. Teachers´ Training & Res., Bhopal, India
Abstract :
Electronic mail (email) has become an essential element for Internet users. Many studies indicate that day by day numbers of internet users are increasing. As population increasing on the Internet, volume of email traffic is also growing. This entire volume of email consist 80% of unwanted emails. These unwanted emails are known as spam email and referred as unsolicited bulk email (UBE). These emails are sent in bulk to large number of recipients. This increased volume of spam email results a most common problem i.e. maintaining email inbox. Spam Email is major issue for internet community because it causes wastage of resources and also pollutes our environment. To prevent these adverse effects of spam email, spam filtering is essential task. Various researchers have proposed many techniques and algorithms for spam filtering; which focuses on individual parameters of the malicious content. In current scenario spammers are also become intelligent they attack on weak point of filtering system. In this work we divided entire process of filtering in four stages. At first stage we applied string tokenizer for generating terms from incoming message. These tokens are passed to second stage where we applied Information Gain (IG) as term selection strategy. After this we passed selected terms to third stage of filtering. Third stage consist of Local Concentration based Artificial Immune System for feature selection. Newly constructed feature vectors are passed to K-Means clustering algorithm for classification at fourth stage. In support of our work we conducted several experiments and gave a comparative analysis with various existing methods on different parameters.
Keywords :
artificial immune systems; feature selection; information filtering; pattern classification; pattern clustering; probability; unsolicited e-mail; vectors; IG; UBE; electronic mail; feature classification; feature selection; feature vectors; information gain; k-means clustering algorithm; local concentration based artificial immune system; spam email; spam filtering; string tokenizer; term selection strategy; unsolicited bulk email; Classification algorithms; Feature extraction; Filtering; Immune system; Internet; Unsolicited electronic mail; AIS; Information Gain(IG); K-means Clustering; Legitimate; Spam;
Conference_Titel :
Confluence The Next Generation Information Technology Summit (Confluence), 2014 5th International Conference -
Conference_Location :
Noida
Print_ISBN :
978-1-4799-4237-4
DOI :
10.1109/CONFLUENCE.2014.6949373