DocumentCode :
3105615
Title :
Latent Friend Mining from Blog Data
Author :
Shen, Dou ; Sun, Jian-Tao ; Yang, Qiang ; Chen, Zheng
Author_Institution :
Dept. of Comput. Sci. & Eng., Hong Kong Univ. of Sci. & Technol., Kowloon
fYear :
2006
fDate :
18-22 Dec. 2006
Firstpage :
552
Lastpage :
561
Abstract :
The rapid growth of blog (also known as "weblog") data provides a rich resource for social community mining. In this paper, we put forward a novel research problem of mining the latent friends of bloggers based on the contents of their blog entries. Latent friends are defined in this paper as people who share the similar topic distribution in their blogs. These people may not actually know each other, but they have the interest and potential to find each other out. Three approaches are designed for latent friend detection. The first one, called cosine similarity-based method, determines the similarity between bloggers by calculating the cosine similarity between the contents of the blogs. The second approach, known as topic-based method, is based on the discovery of latent topics using a latent topic model and then calculating the similarity at the topic level. The third one is two-level similarity-based, which is conducted in two stages. In the first stage, an existing topic hierarchy is exploited to build a topic distribution for a blogger. Then, in the second stage, a detailed similarity comparison is conducted for bloggers that are close in interest to each other which are discovered in the first stage. Our experimental results show that both the topic-based and two-level similarity-based methods work well, and the last approach performs much better than the first two. In this paper, we give a detailed analysis of the advantages and disadvantages of different approaches.
Keywords :
Internet; data mining; Weblog data; cosine similarity-based method; latent friend mining; latent topic model; social community mining; topic-based method; Asia; Computer science; Data engineering; Data mining; Data privacy; Electronic mail; Information services; Internet; Social network services; Web sites;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining, 2006. ICDM '06. Sixth International Conference on
Conference_Location :
Hong Kong
ISSN :
1550-4786
Print_ISBN :
0-7695-2701-7
Type :
conf
DOI :
10.1109/ICDM.2006.95
Filename :
4053081
Link To Document :
بازگشت