Title :
Bayesian spam classification: Time efficient radix encoded fragmented database approach
Author :
Jatana, Nishtha ; Sharma, Kamna
Author_Institution :
Dept. of Comput. Sci. & Eng., Maharaja Surajmal Inst. of Technol., Delhi, India
Abstract :
Spam or unsolicited email has become a major problem for companies and private users. The problems associated with spam and various approaches that attempt to deal with it, have been presented here. Statistical classifiers are one such group of methods that show adequate performance in filtering spam, based upon the previous knowledge gathered through collected and classified emails. Learning algorithms that uses the Naive Bayesian classifier have shown promising results in separating spam from legitimate mail. An encoded and fragmented database approach that resembles radix sort technique has been proposed and applied for first time to improve Paul Graham´s Naive Bayes machine learning algorithm for spam filtering. The main objective of this paper is to reduce overall time in the process of spam detection. Quantitative and qualitative analysis of the proposed technique, performed on two public spam databases (SpamAssasin and Ling Spam) has shown improved time performance. The proposed method has performed up to six times faster than the existing Paul Graham´s Bayesian approach.
Keywords :
Bayes methods; database management systems; information filtering; learning (artificial intelligence); pattern classification; sorting; unsolicited e-mail; Bayesian spam classification; Ling Spam; SpamAssasin; filtering spam; legitimate mail; naive Bayes machine learning algorithm; naive Bayesian classifier; public spam databases; qualitative analysis; quantitative analysis; radix sort technique; spam detection; spam email; spam filtering; statistical classifiers; time efficient radix encoded fragmented database approach; unsolicited email; Bayes methods; Databases; Filtering; Postal services; Training; Unsolicited electronic mail; Bayesian; Probability; Spam; Tokenization; formatting; insert; style; styling;
Conference_Titel :
Computing for Sustainable Global Development (INDIACom), 2014 International Conference on
Conference_Location :
New Delhi
Print_ISBN :
978-93-80544-10-6
DOI :
10.1109/IndiaCom.2014.6828102