• DocumentCode
    1908688
  • Title

    Mining Parallel Corpus from Sina Microblog

  • Author

    Haitao Xing ; Muyun Yang ; Haoliang Qi ; Sheng Li ; Tiejun Zhao

  • Author_Institution
    Sch. of Comput. Sci. & Technol., Harbin Inst. of Technol., Harbin, China
  • fYear
    2013
  • fDate
    17-19 Aug. 2013
  • Firstpage
    99
  • Lastpage
    102
  • Abstract
    Finding the parallel corpus as a kind of specific type of information from microblogging sites with millions of users, such as Sina Microblog, is a challenging task. This paper investigates the feasibility of mining such data from the username, the hash tag as well as the user relations by three different methods. The initial experiment is encouraging under the current restriction of limited microblog content access.
  • Keywords
    Web sites; data mining; Sina microblog; data mining; hash tag; limited microblog content access; microblogging sites; parallel corpus mining; user relations; Adaptation models; Computer science; Data mining; Natural language processing; Real-time systems; Twitter; Follower; Hash tag; Parallel Corpus Mining; Sina Micoblog; Username;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Asian Language Processing (IALP), 2013 International Conference on
  • Conference_Location
    Urumqi
  • Type

    conf

  • DOI
    10.1109/IALP.2013.29
  • Filename
    6646013