DocumentCode
1797361
Title
Deobfuscation based on edit distance algorithm for spam filitering
Author
Xinwang Zhong
Author_Institution
Dept. of Comput. Sci. & Eng., South China Univ. of Technol., Guangzhou, China
Volume
1
fYear
2014
fDate
13-16 July 2014
Firstpage
109
Lastpage
114
Abstract
Spamming problem has been grown rapidly in the Internet. An adversary obfuscates the spam message by misspelling or inserting useless characters to mislead the decision of the spam filter. Humans still can understand the original meaning of the camouflaged words but the spam filter cannot recognize them. This paper focuses on the well-known obfuscation problem which uses non-alphabetical characters, e.g. Viagra is modified to V!@gr@. The string edit distance algorithm is revised for handling the non-alphabetical characters. The proposed deobfuscation method outperforms than the traditional string edit distance algorithm in the experiment.
Keywords
formal languages; information filtering; support vector machines; unsolicited e-mail; SVM; backtrack algorithm; deobfuscation method; nonalphabetical character handling; obfuscation problem; spam filtering; spam message; spamming problem; string edit distance algorithm; support vector machine; Abstracts; Barium; Indexes; Support vector machines; Unsolicited electronic mail; Backtrack algorithm; SVM; Spam Filter; String Edit Distance algorithm;
fLanguage
English
Publisher
ieee
Conference_Titel
Machine Learning and Cybernetics (ICMLC), 2014 International Conference on
Conference_Location
Lanzhou
ISSN
2160-133X
Print_ISBN
978-1-4799-4216-9
Type
conf
DOI
10.1109/ICMLC.2014.7009101
Filename
7009101
Link To Document