Mining Parallel Corpus from Sina Microblog

Author

Haitao Xing ; Muyun Yang ; Haoliang Qi ; Sheng Li ; Tiejun Zhao

Author_Institution

Sch. of Comput. Sci. & Technol., Harbin Inst. of Technol., Harbin, China

fYear

2013

fDate

17-19 Aug. 2013

Firstpage

Lastpage

102

Abstract

Finding the parallel corpus as a kind of specific type of information from microblogging sites with millions of users, such as Sina Microblog, is a challenging task. This paper investigates the feasibility of mining such data from the username, the hash tag as well as the user relations by three different methods. The initial experiment is encouraging under the current restriction of limited microblog content access.

Keywords

Web sites; data mining; Sina microblog; data mining; hash tag; limited microblog content access; microblogging sites; parallel corpus mining; user relations; Adaptation models; Computer science; Data mining; Natural language processing; Real-time systems; Twitter; Follower; Hash tag; Parallel Corpus Mining; Sina Micoblog; Username;

fLanguage

English

Publisher

ieee

Conference_Titel

Asian Language Processing (IALP), 2013 International Conference on

Conference_Location

Urumqi

Type

conf

DOI

10.1109/IALP.2013.29

Filename

6646013

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=1908688