• DocumentCode
    3532976
  • Title

    Mutual detection between spam blogs and keywords based on cooccurrence cluster seed

  • Author

    Ishida, Kazunari

  • Author_Institution
    Fac. of Policy Sci., Univ. of Shimane, Japan
  • fYear
    2009
  • fDate
    28-31 July 2009
  • Firstpage
    8
  • Lastpage
    13
  • Abstract
    This paper proposes a mutual detection mechanism between spam blogs and keywords for filtering spam blogs from updated blog data. Spam blogs are problematic in extracting useful marketing information from the blogosphere; they often appear to be rich sources of information based on individual opinion and social reputation. One characteristic of spam blogs is copied-and-pasted articles based on normal blogs and news articles. Another is multiple postings of the same article to increase the chances of exposure and income from advertising. Because of these characteristics, spam blogs share common keywords, and such blogs and keywords can form large spam bi-clusters. Based on such clusters, this paper explains how to detect spam blogs and spam keywords with mutual filtering. It reports that the maximum precision of the filtering is 95%, based on a preliminary experiment with approximately six months´ updated blog data and a more detailed experiment with one day´s data.
  • Keywords
    Web sites; advertising data processing; e-mail filters; information filtering; pattern clustering; unsolicited e-mail; advertising; cooccurrence cluster seed; copied-and-pasted article; marketing information extraction; mutual detection mechanism; social reputation; spam bi-cluster; spam blogosphere filtering; spam keyword; Advertising; Bipartite graph; Blogs; Costs; Data mining; Filtering; Information resources; Machine learning; Support vector machines; Unsolicited electronic mail;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Networked Digital Technologies, 2009. NDT '09. First International Conference on
  • Conference_Location
    Ostrava
  • Print_ISBN
    978-1-4244-4614-8
  • Electronic_ISBN
    978-1-4244-4615-5
  • Type

    conf

  • DOI
    10.1109/NDT.2009.5272171
  • Filename
    5272171