DocumentCode :
3703549
Title :
LDA based semi-supervised learning from streaming short text
Author :
Ji-De Chen;Hung-Yu Kao
Author_Institution :
Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan, Taiwan
fYear :
2015
Firstpage :
1
Lastpage :
8
Abstract :
With the rapidly growing of real-time social media, like Twitter, many users share and discuss their interest topics through such platforms. Hashtag is a type of metadata tag which allows users to annotate their topics of tweets. For research usage, for example, hashtags can help the performance of event detection by observing the trend of hashtags. Although Twitter grows rapidly, hashtag growth is not as expected. Our dataset shows that there are less than 20% of all tweets containing hashtags. We think that it is caused by that most users may have no idea what hashtags are suitable for tweets they post. If we can recommend suitable hashtags to users, it can be one of the solutions to solve the problem of low usage rate of hashtag. Hashtag recommendation belongs to supervised learning problem. More labeled data for training the learning model can get higher performance in prediction. However, labeled data in hashtag recommendation is not so much due to low usage rate of hashtag. Thus, we want to exploit unlabeled data, i.e. non-hashtag tweets, to solve this problem. Now we have large amount of unlabeled data, but directly adding all non-hashtag tweets may not be helpful to train the model. To overcome this issue, we apply the weight-updating mechanisms to filter out the useless parts of non-hashtag tweets. These mechanisms also have to consider the temporal characteristics of hashtag due to the real-time nature of Twitter. The experimental results in this research show that adding non-hashtag tweets to extend original training data outperforms baseline methods which only exploit labeled data to train the model.
Keywords :
"Twitter","Tagging","Training","Data models","Training data","Media","Supervised learning"
Publisher :
ieee
Conference_Titel :
Data Science and Advanced Analytics (DSAA), 2015. 36678 2015. IEEE International Conference on
Print_ISBN :
978-1-4673-8272-4
Type :
conf
DOI :
10.1109/DSAA.2015.7344830
Filename :
7344830
Link To Document :
بازگشت