Title :
Research on Web Spam Detection Based on Support Vector Machine
Author :
Jia, Zhiyang ; Li, Weiwei ; Gao, Wei ; Xia, Youming
Author_Institution :
Dept. of Inf. Sci. & Technol., Yunnan Univ., Lijiang, China
Abstract :
With the fast development of Internet, web pages created by web spam which aimed at cheating the search engine and increasing rankings in the search results are prevailing. Web spam is a big problem for today´s search engine; therefore it is necessary for search engines to be able to detect web spam during crawling. The web spam detection problem is viewed as a classification problem, that means classification models are created by machine learning classification algorithms, which given a web page, it will classify it in one of two categories: Normal and Spam. For support vector machine classification model, soft margin classifier based on linear support vector machine was developed by learning the sample set, and penalty functions were defined according to the links between web pages that seems to have similar characteristics. Not only the content features but also the link structures between web pages were taken advantage of to build classifier.
Keywords :
Web services; information retrieval; pattern classification; search engines; support vector machines; unsolicited e-mail; Internet; Web page; Web spam detection; classification problem; crawling; linear support vector machine; machine learning classification algorithm; penalty function; search engine; soft margin classifier; Educational institutions; Feature extraction; Machine learning; Search engines; Support vector machines; Unsolicited electronic mail; Web pages; SVM; anti-spam; search engine; web spam; web spam detection;
Conference_Titel :
Communication Systems and Network Technologies (CSNT), 2012 International Conference on
Conference_Location :
Rajkot
Print_ISBN :
978-1-4673-1538-8
DOI :
10.1109/CSNT.2012.117