DocumentCode
476193
Title
A morpheme-based lexical chunking system for Chinese
Author
Fu, Guo-hong ; Kit, Chun-yu ; Webster, Jonathan J.
Author_Institution
Sch. of Comput. Sci. & Technol., Heilongjiang Univ., Harbin
Volume
5
fYear
2008
fDate
12-15 July 2008
Firstpage
2455
Lastpage
2460
Abstract
Chinese lexical analysis consists of word segmentation and part-of-speech tagging. Most previous studies consider them as two separate tasks. In this paper we formalize the two processes as a unique chunking task on a sequence of morphemes and present an integrated lexical analysis system for Chinese based on lexicalized hidden Markov models. In this way, both contextual lexical information and word-internal morphological features can be statistically explored and further combined for disambiguation and unknown word resolution. Experimental results show that the proposed system outperforms several baselines, illustrating the benefits of the unified lexical chunking method with morphemes as the basic units.
Keywords
hidden Markov models; natural language processing; Chinese lexical analysis; hidden Markov model; morpheme-based lexical chunking system; part-of-speech tagging; statistical analysis; word segmentation; word-internal morphological feature; Computer science; Cybernetics; Hidden Markov models; Information analysis; Information retrieval; Machine learning; Morphology; Natural language processing; Natural languages; Tagging; Chinese lexical analysis; Lexical chunking; Part-of-speech tagging; Word segmentation;
fLanguage
English
Publisher
ieee
Conference_Titel
Machine Learning and Cybernetics, 2008 International Conference on
Conference_Location
Kunming
Print_ISBN
978-1-4244-2095-7
Electronic_ISBN
978-1-4244-2096-4
Type
conf
DOI
10.1109/ICMLC.2008.4620820
Filename
4620820
Link To Document