DocumentCode
1908688
Title
Mining Parallel Corpus from Sina Microblog
Author
Haitao Xing ; Muyun Yang ; Haoliang Qi ; Sheng Li ; Tiejun Zhao
Author_Institution
Sch. of Comput. Sci. & Technol., Harbin Inst. of Technol., Harbin, China
fYear
2013
fDate
17-19 Aug. 2013
Firstpage
99
Lastpage
102
Abstract
Finding the parallel corpus as a kind of specific type of information from microblogging sites with millions of users, such as Sina Microblog, is a challenging task. This paper investigates the feasibility of mining such data from the username, the hash tag as well as the user relations by three different methods. The initial experiment is encouraging under the current restriction of limited microblog content access.
Keywords
Web sites; data mining; Sina microblog; data mining; hash tag; limited microblog content access; microblogging sites; parallel corpus mining; user relations; Adaptation models; Computer science; Data mining; Natural language processing; Real-time systems; Twitter; Follower; Hash tag; Parallel Corpus Mining; Sina Micoblog; Username;
fLanguage
English
Publisher
ieee
Conference_Titel
Asian Language Processing (IALP), 2013 International Conference on
Conference_Location
Urumqi
Type
conf
DOI
10.1109/IALP.2013.29
Filename
6646013
Link To Document