DocumentCode :
1908688
Title :
Mining Parallel Corpus from Sina Microblog
Author :
Haitao Xing ; Muyun Yang ; Haoliang Qi ; Sheng Li ; Tiejun Zhao
Author_Institution :
Sch. of Comput. Sci. & Technol., Harbin Inst. of Technol., Harbin, China
fYear :
2013
fDate :
17-19 Aug. 2013
Firstpage :
99
Lastpage :
102
Abstract :
Finding the parallel corpus as a kind of specific type of information from microblogging sites with millions of users, such as Sina Microblog, is a challenging task. This paper investigates the feasibility of mining such data from the username, the hash tag as well as the user relations by three different methods. The initial experiment is encouraging under the current restriction of limited microblog content access.
Keywords :
Web sites; data mining; Sina microblog; data mining; hash tag; limited microblog content access; microblogging sites; parallel corpus mining; user relations; Adaptation models; Computer science; Data mining; Natural language processing; Real-time systems; Twitter; Follower; Hash tag; Parallel Corpus Mining; Sina Micoblog; Username;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Asian Language Processing (IALP), 2013 International Conference on
Conference_Location :
Urumqi
Type :
conf
DOI :
10.1109/IALP.2013.29
Filename :
6646013
Link To Document :
بازگشت