DocumentCode
442058
Title
Chinese word segmentation based on A-priori and adjacent characters
Author
Wang, Ye ; Huang, Shang-Teng
Author_Institution
Dept. of Comput. Sci. & Eng., Shanghai Jiao Tong Univ., China
Volume
6
fYear
2005
fDate
18-21 Aug. 2005
Firstpage
3808
Abstract
Chinese word segmentation is an important and difficult problem, due to the special written format of Chinese. In this paper, an adjacent characters and A-priori based algorithm is presented for segmentation. In this new method, the information of adjacent characters is utilized to join the n-grams and their adjacent characters. Experimental results show that the performance of the new method is remarkably better than the mutual information based methods when LDC95T13 Chinese collection is tested.
Keywords
natural languages; word processing; A-priori based algorithm; Chinese word segmentation; adjacent characters algorithm; Computer science; Cybernetics; Dictionaries; Gallium nitride; Machine learning; Mutual information; Natural languages; Statistical analysis; Sun; Testing; A-priori; Word segmentation; adjacent characters; n-grams;
fLanguage
English
Publisher
ieee
Conference_Titel
Machine Learning and Cybernetics, 2005. Proceedings of 2005 International Conference on
Conference_Location
Guangzhou, China
Print_ISBN
0-7803-9091-1
Type
conf
DOI
10.1109/ICMLC.2005.1527603
Filename
1527603
Link To Document