Title :
MML inference of Finite State Automata for probabilistic spam detection
Author :
Saikrishna, Vidya ; Dowe, David L. ; Ray, Sid
Author_Institution :
Monash Univ., Clayton, VIC, Australia
Abstract :
MML (Minimum Message Length) has emerged as a powerful tool in inductive inference of discrete, continuous and hybrid structures. The Probabilistic Finite State Automaton (PFSA) is one such discrete structure that needs to be inferred for classes of problems in the field of Computer Science including artificial intelligence, pattern recognition and data mining. MML has also served as a viable tool in many classes of problems in the field of Machine Learning including both supervised and unsupervised learning. The classification problem is the most common among them. This research is a two-fold solution to a problem where one part focusses on the best inferred PFSA using MML and the second part focusses on the classification problem of Spam Detection. Using the best PFSA inferred in part 1, the Spam Detection theory has been tested using MML on a publicly available Enron Spam dataset. The filter was evaluated on various performance parameters like precision and recall. The evaluation was also done taking into consideration the cost of misclassification in terms of weighted accuracy rate and weighted error rate. The results of our empirical evaluation indicate the classification accuracy to be around 93%, which outperforms well-known established spam filters.
Keywords :
data mining; pattern classification; statistical distributions; unsolicited e-mail; unsupervised learning; Bayesian information theory; MML; artificial intelligence; computer science; data mining; discrete structure; enron spam dataset; inferred PFSA; machine learning; minimum message length; pattern recognition; performance parameters; probabilistic finite state automaton; probabilistic spam detection; spam detection theory; spam filtering; unsupervised learning; weighted accuracy rate; weighted error rate; Accuracy; Automata; Encoding; Merging; Postal services; Probabilistic logic; Unsolicited electronic mail; Bayesian Information Theory; Finite State Automaton (FSA); Minimum Message Length (MML); Probabilistic Finite State Automaton (PFSA); Spam Filtering;
Conference_Titel :
Advances in Pattern Recognition (ICAPR), 2015 Eighth International Conference on
Conference_Location :
Kolkata
DOI :
10.1109/ICAPR.2015.7050655