• DocumentCode
    3390680
  • Title

    Characterizing comment spam in the blogosphere through content analysis

  • Author

    Bhattarai, Archana ; Rus, Vasile ; Dasgupta, Dipankar

  • Author_Institution
    Dept. of Comput. Sci., Univ. of Memphis, Memphis, TN
  • fYear
    2009
  • fDate
    March 30 2009-April 2 2009
  • Firstpage
    37
  • Lastpage
    44
  • Abstract
    Spams are no longer limited to emails and Web-pages. The increasing penetration of spam in the form of comments in blogs and social networks has started becoming a nuisance and potential threat. In this work, we explore the challenges posed by this type of spam in the blogosphere with substantial generalization regarding other social media. Thus, we investigate the characteristics of comment spam in blogs based on their content. The framework uses some of the previously explored methods developed to effectively extract the features of the blog spam and also introduces a novel method of active learning from the raw data without requiring training instances. This makes the approach more flexible and realistic for such applications. We also incorporate the concept of co-training for supervised learning to get accurate results. The preliminary evaluation of the proposed framework shows promising results.
  • Keywords
    Internet; learning (artificial intelligence); unsolicited e-mail; blogosphere; comment spam; content analysis; email; social network; supervised learning; Bayesian methods; Blogs; Electronic mail; Machine learning algorithms; Search engines; Social network services; Statistics; Unsolicited electronic mail; Web pages; Writing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computational Intelligence in Cyber Security, 2009. CICS '09. IEEE Symposium on
  • Conference_Location
    Nashville, TN
  • Print_ISBN
    978-1-4244-2769-7
  • Type

    conf

  • DOI
    10.1109/CICYBS.2009.4925088
  • Filename
    4925088