Title :
Mutual detection between spam blogs and keywords based on cooccurrence cluster seed
Author :
Ishida, Kazunari
Author_Institution :
Fac. of Policy Sci., Univ. of Shimane, Japan
Abstract :
This paper proposes a mutual detection mechanism between spam blogs and keywords for filtering spam blogs from updated blog data. Spam blogs are problematic in extracting useful marketing information from the blogosphere; they often appear to be rich sources of information based on individual opinion and social reputation. One characteristic of spam blogs is copied-and-pasted articles based on normal blogs and news articles. Another is multiple postings of the same article to increase the chances of exposure and income from advertising. Because of these characteristics, spam blogs share common keywords, and such blogs and keywords can form large spam bi-clusters. Based on such clusters, this paper explains how to detect spam blogs and spam keywords with mutual filtering. It reports that the maximum precision of the filtering is 95%, based on a preliminary experiment with approximately six months´ updated blog data and a more detailed experiment with one day´s data.
Keywords :
Web sites; advertising data processing; e-mail filters; information filtering; pattern clustering; unsolicited e-mail; advertising; cooccurrence cluster seed; copied-and-pasted article; marketing information extraction; mutual detection mechanism; social reputation; spam bi-cluster; spam blogosphere filtering; spam keyword; Advertising; Bipartite graph; Blogs; Costs; Data mining; Filtering; Information resources; Machine learning; Support vector machines; Unsolicited electronic mail;
Conference_Titel :
Networked Digital Technologies, 2009. NDT '09. First International Conference on
Conference_Location :
Ostrava
Print_ISBN :
978-1-4244-4614-8
Electronic_ISBN :
978-1-4244-4615-5
DOI :
10.1109/NDT.2009.5272171