DocumentCode :
3124340
Title :
Chinese new word extraction from MicroBlog data
Author :
Qi-Long Su ; Bing-Quan Liu
Author_Institution :
Sch. of Comput. Sci. & Technol., Harbin Inst. of Technol., Harbin, China
Volume :
04
fYear :
2013
fDate :
14-17 July 2013
Firstpage :
1874
Lastpage :
1879
Abstract :
Chinese new word extraction is an important task in Chinese natural language processing and MicroBlog has become a main place of new words´ creation and dissemination. Although many effective methods have been proposed, there is a lack of research on Internet texts especially MicroBlog texts. In this paper, we study the MicroBlog-oriented method for new word extraction. Firstly we analyze the performance of classical statistical measures in extracting new words from MicroBlog texts. Secondly we base our work on Branch Entropy. For the shortcomings of statistical measures and the characteristics of MicroBlog texts, we propose a modified method. Experimental result demonstrates that our method is feasible and effective. Lastly, we show four types of new words extracted from MicroBlog.
Keywords :
Internet; Web sites; entropy; natural language processing; statistical analysis; text analysis; text detection; Chinese natural language processing a; Chinese new word extraction; Internet text; branch entropy; microblog data; microblog texts; microblog-oriented method; new word creation; statistical measures; Abstracts; Data mining; Erbium; Support vector machines; Vocabulary; Branch entropy; MicroBlog; Natural language processing; New word extraction; Statistical measure;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Machine Learning and Cybernetics (ICMLC), 2013 International Conference on
Conference_Location :
Tianjin
Type :
conf
DOI :
10.1109/ICMLC.2013.6890901
Filename :
6890901
Link To Document :
بازگشت