DocumentCode
3709042
Title
Detecting spam webpages through topic and semantics analysis
Author
Jing Wan;Mufan Liu;Junkai Yi;Xuechao Zhang
Author_Institution
Beijing University of Chemical Technology, China
fYear
2015
fDate
6/1/2015 12:00:00 AM
Firstpage
1
Lastpage
7
Abstract
Spam web pages have posed great challenges to the development of search engines. The content spam is among the commonly used. Along with the development of Internet technologies, the content spam is difficult to detect. The current detection methods for the web page using content spam technique primarily rely on the statistical features, which has obvious limitations. In this article, a spam webpage detection method based on topic and semantics was proposed, with the use of two categories of features, namely, semantics and statistics. Topic modeling was first performed over the contents of the webpage, with the webpage contents mapped into the topic space. This was followed by semantic analysis and calculation in the topic space according to the distribution of topics. Semantic features were extracted for the classification of webpages by combining with the statistical features. The results verified that the proposed method can achieve a better effect.
Keywords
"Semantics","Feature extraction","Analytical models","Search engines","Algorithm design and analysis","Mathematical model","Internet"
Publisher
ieee
Conference_Titel
Computer & Information Technology (GSCIT), 2015 Global Summit on
Print_ISBN
978-1-4673-6586-4
Type
conf
DOI
10.1109/GSCIT.2015.7353328
Filename
7353328
Link To Document