Title :
The Construction of a Kind of Chat Corpus in Chinese Word Segmentation
Author :
Xia Yang;Peng Jin;Xingyuan Chen
Author_Institution :
Lab. of Intell. Inf. Process. &
Abstract :
In this thesis, we present a kind of chat corpus in Chinese word segmentation and we also present its construction process. This kind of chat corpus works in the way of combining application of automatic segmentation technology with the method of manual correction. Thereinto, the automatic segmentation is performed in the way of using the Natural Language Processing Information Retrieval (NLPIR). As to manual correction, errors from NLPIR will be categorized and some annotation suggestions will be put forward. Combining using these two methods above, our study, which is a preliminary study, could be very easy extended to other Chats texts. What´s more, the corpus, which produced in our works, could provide a good standard for the research of Chinese word segmentation, especially in the part of dialogue.
Keywords :
"Manuals","Internet","Information processing","Buildings","Tagging","Dictionaries","Computers"
Conference_Titel :
Web Intelligence and Intelligent Agent Technology (WI-IAT), 2015 IEEE / WIC / ACM International Conference on
DOI :
10.1109/WI-IAT.2015.196