DocumentCode :
2757443
Title :
Automatic Tag Recommendation for Weblogs
Author :
Liu, Yicen ; Liu, Mingrong ; Chen, Xing ; Xiang, Liang ; Yang, Qing
Author_Institution :
Inst. of Autom., Chinese Acad. of Sci., Beijing, China
Volume :
1
fYear :
2009
fDate :
25-26 July 2009
Firstpage :
546
Lastpage :
549
Abstract :
There have been many researches on how to recommend tags for weblogs. In this paper, we propose a novel automatic tag recommendation algorithm, which can be used in the large-scale and real-time data process effectively and efficiently. Most existing researches on tag suggestion focus on firstly mining the relationship between testing and training data and then assigning the top ranked tags of the most related training data to the testing object. However, they ignore the internal relationship between tags and weblogs. According to our research, more than 43% tags, which have been labeled by weblog users, have actually been used in the body of the text. At the meanwhile, the term frequency distribution, the paragraph frequency distribution and the first occurrence position of tags are very different from the ones of non-tags in the text. In this paper, the tags of a weblog are assigned in two steps. First of all, some probability distributions of the word attributes are trained by the labeled training weblogs, and some keywords of a testing weblog are extracted as one part of the tags based on the probability distributions. Then the other part of the tags are obtained from the first part ones with the help of Latent Semantic Indexing (LSI) model. Experiments on a large-scale tagging dataset of weblogs 12 show that the average tagging time for a new weblog is less than 0.02 seconds, and over 74% testing weblogs are correctly labeled with the top 15 tags.
Keywords :
Web sites; data mining; identification technology; indexing; Weblogs; automatic tag recommendation; data mining; large-scale data process; latent semantic indexing; real-time data process; Frequency; Indexing; Internet; Large scale integration; Large-scale systems; Probability distribution; Tagging; Testing; Training data; Web pages; data mining; recommendation system;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Technology and Computer Science, 2009. ITCS 2009. International Conference on
Conference_Location :
Kiev
Print_ISBN :
978-0-7695-3688-0
Type :
conf
DOI :
10.1109/ITCS.2009.263
Filename :
5190132
Link To Document :
بازگشت