• DocumentCode
    2118643
  • Title

    Effectively Detecting Content Spam on the Web Using Topical Diversity Measures

  • Author

    Cailing Dong ; Bin Zhou

  • Author_Institution
    Dept. of Inf. Syst., Univ. of Maryland, Baltimore, MD, USA
  • Volume
    1
  • fYear
    2012
  • fDate
    4-7 Dec. 2012
  • Firstpage
    266
  • Lastpage
    273
  • Abstract
    Recent studies about web spam detection have utilized various content-based and link-based features to construct a spam classification model. In this paper, we conduct a thorough analysis of content spam on the web using topic models and propose several novel topical diversity measures for content spam detection. We adopt the web spam benchmark data set WEBSPAM-UK2007 for evaluation, and the experimental results verify that by integrating our topical diversity measures the performance of the state-of-the-art web spam detection methods can be greatly improved. In addition, comparing to existing features for training spam classification models, our topical diversity measures can achieve high spam detection performance using small set of training data. In personalized web spam detection, the training data (i.e., user´s spam labeling results) are typically small. Our finding makes personalized web spam detection highly achievable. We develop an efficient and effective regression model using topical diversity measures for personalized web spam detection, and present some promising results obtained from an empirical study.
  • Keywords
    Internet; pattern classification; unsolicited e-mail; WEBSPAM-UK2007; Web spam benchmark data; content-based features; link-based features; personalized Web content spam detection; regression model; spam classification model; topic models; topical diversity measures; training data; training spam classification models; classification; personalization; topic model; web spam;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Web Intelligence and Intelligent Agent Technology (WI-IAT), 2012 IEEE/WIC/ACM International Conferences on
  • Conference_Location
    Macau
  • Print_ISBN
    978-1-4673-6057-9
  • Type

    conf

  • DOI
    10.1109/WI-IAT.2012.98
  • Filename
    6511895