Title :
Detecting Worms Using Data Mining Techniques: Learning in the Presence of Class Noise
Author :
Ismail, Ismahani ; Marsono, Muhammad Nadzir ; Nor, Sulaiman Mohd
Author_Institution :
Fac. of Electr. Eng., Univ. Teknol. Malaysia, Johor Bahru, Malaysia
Abstract :
Worms are self-contained programs that spread over the Internet. Worms cause problems such as lost of information, information theft and denial-of-service attacks. The first part of the paper evaluates the detection of worms based on content classification by using all machine learning techniques available in WEKA data mining tools. Four most accurate and quite fast classifiers are identified for further analysis-Naive Bayes, J48, SMO and Winnow. Results show that classification using machine learning techniques could classify worms to 99% accuracy. From the accuracy perspective, J48 performs better than other algorithms meanwhile Naive Bayes and Winnow show the best performances in terms of speed. The second part of the paper analyzes the accuracy these four classifiers under the presence of class noise in learning corpora. By injecting class noise ranging between 0% and 50% into positive and negative corpora, results from the simulation show gradual decrease in accuracy and increase in false positive and false negative for all analyzed techniques. The presence of the classes noise affects false positive more significantly compared to false negative. The results show that worm detection with classification algorithms could not tolerate the presence of classes noise in learning corpora.
Keywords :
Internet; data mining; invasive software; learning (artificial intelligence); noise; pattern classification; Internet vulnerability; Naive Bayes method; WEKA data mining tool; Winnow algorithm; class noise; content classification; machine learning technique; self contained program; worm detection; Accuracy; Data mining; Feature extraction; Grippers; Noise; Payloads; Training; class noise; data-mining techniques; worm detection;
Conference_Titel :
Signal-Image Technology and Internet-Based Systems (SITIS), 2010 Sixth International Conference on
Conference_Location :
Kuala Lumpur
Print_ISBN :
978-1-4244-9527-6
Electronic_ISBN :
978-0-7695-4319-2
DOI :
10.1109/SITIS.2010.41