Title :
Chinese new word extraction from MicroBlog data
Author :
Qi-Long Su ; Bing-Quan Liu
Author_Institution :
Sch. of Comput. Sci. & Technol., Harbin Inst. of Technol., Harbin, China
Abstract :
Chinese new word extraction is an important task in Chinese natural language processing and MicroBlog has become a main place of new words´ creation and dissemination. Although many effective methods have been proposed, there is a lack of research on Internet texts especially MicroBlog texts. In this paper, we study the MicroBlog-oriented method for new word extraction. Firstly we analyze the performance of classical statistical measures in extracting new words from MicroBlog texts. Secondly we base our work on Branch Entropy. For the shortcomings of statistical measures and the characteristics of MicroBlog texts, we propose a modified method. Experimental result demonstrates that our method is feasible and effective. Lastly, we show four types of new words extracted from MicroBlog.
Keywords :
Internet; Web sites; entropy; natural language processing; statistical analysis; text analysis; text detection; Chinese natural language processing a; Chinese new word extraction; Internet text; branch entropy; microblog data; microblog texts; microblog-oriented method; new word creation; statistical measures; Abstracts; Data mining; Erbium; Support vector machines; Vocabulary; Branch entropy; MicroBlog; Natural language processing; New word extraction; Statistical measure;
Conference_Titel :
Machine Learning and Cybernetics (ICMLC), 2013 International Conference on
Conference_Location :
Tianjin
DOI :
10.1109/ICMLC.2013.6890901