DocumentCode
2112930
Title
Co-training based semi-supervised Web spam detection
Author
Wei Wang ; Xiao-Dong Lee ; An-Lei Hu ; Guang-Gang Geng
Author_Institution
Comput. Network Inf. Center, China Internet Network Inf. Center, Beijing, China
fYear
2013
fDate
23-25 July 2013
Firstpage
789
Lastpage
793
Abstract
Traditional Web spam classifiers use only labeled data (feature/label pairs) to train. Labeled spam instances, however, are very difficult, expensive, or time consuming to obtain, as they require the efforts of experienced human annotators. Meanwhile unlabeled samples are relatively easy to collect. Semi-supervised learning addresses the classification problem by using large amount of unlabeled data, together with the labeled data, to build better classifiers. This paper proposes two new semi-supervised learning algorithms to boost the performance of Web spam classifiers. The algorithms integrate the traditional co-training with the topological dependency based hyperlink learning. The proposed methods extend our previous work on self-training based semi-supervised Web spam detection. The experimental results with 100/200 labeled samples on the standard WEBSPAM-UK2006 benchmark showed that the algorithms are effective.
Keywords
Internet; security of data; Web spam classifiers; classification problem; cotraining based semisupervised Web spam detection; human annotators; hyperlink learning; self-training based semisupervised Web spam detection; semisupervised learning algorithms; spam instances; topological dependency; Feature extraction; Information retrieval; Prediction algorithms; Semisupervised learning; Standards; Training; Unsolicited electronic mail;
fLanguage
English
Publisher
ieee
Conference_Titel
Fuzzy Systems and Knowledge Discovery (FSKD), 2013 10th International Conference on
Conference_Location
Shenyang
Type
conf
DOI
10.1109/FSKD.2013.6816301
Filename
6816301
Link To Document