Title :
Textual and Visual Content-Based Anti-Phishing: A Bayesian Approach
Author :
Zhang, Haijun ; Liu, Gang ; Chow, Tommy W S ; Liu, Wenyin
Author_Institution :
Dept. of Electron. Eng., City Univ. of Hong Kong, Kowloon, China
Abstract :
A novel framework using a Bayesian approach for content-based phishing web page detection is presented. Our model takes into account textual and visual contents to measure the similarity between the protected web page and suspicious web pages. A text classifier, an image classifier, and an algorithm fusing the results from classifiers are introduced. An outstanding feature of this paper is the exploration of a Bayesian model to estimate the matching threshold. This is required in the classifier for determining the class of the web page and identifying whether the web page is phishing or not. In the text classifier, the naive Bayes rule is used to calculate the probability that a web page is phishing. In the image classifier, the earth mover´s distance is employed to measure the visual similarity, and our Bayesian model is designed to determine the threshold. In the data fusion algorithm, the Bayes theory is used to synthesize the classification results from textual and visual content. The effectiveness of our proposed approach was examined in a large-scale dataset collected from real phishing cases. Experimental results demonstrated that the text classifier and the image classifier we designed deliver promising results, the fusion algorithm outperforms either of the individual classifiers, and our model can be adapted to different phishing cases.
Keywords :
Bayes methods; Internet; computer crime; image classification; sensor fusion; text analysis; Bayes theory; Bayesian approach; Web page detection; classifier fusion; image classifier; text classifier; textual content-based anti-phishing; visual content-based anti-phishing; Bayesian methods; Feature extraction; Image color analysis; Visualization; Vocabulary; Web pages; Bayes theory; classifier; data fusion; phishing detection; web page; Algorithms; Artificial Intelligence; Bayes Theorem; Computer Security; Crime; Data Mining; Humans; Internet; Models, Statistical; Pattern Recognition, Automated; Software; Software Validation; Statistics as Topic;
Journal_Title :
Neural Networks, IEEE Transactions on
DOI :
10.1109/TNN.2011.2161999