DocumentCode
507554
Title
Automatic Word Segmentation for Chinese Classics of Tea Based on Tree-Pruning
Author
Fang, Miao ; Jiang, Yi ; Zhao, Qi ; Jiang, Xin
Author_Institution
Northeastern Univ. at Qinhuangdao, Qinhuangdao, China
Volume
1
fYear
2009
fDate
Nov. 30 2009-Dec. 1 2009
Firstpage
438
Lastpage
441
Abstract
Automatic word-segmentation is vital for the reading, comprehension and translation of classics. However, large amount of special terms, allusions and proper names within the classics make it difficult for word segmentation. Taking classics of tea as the subject of research, a method was proposed using likelihood ratio statistics to decide two-character words candidate, three character words candidates and multi-character words candidates, and then segment classics of tea automatically by tree-pruning algorithm. The computation complexity of the tree-pruning algorithm is O (LN), L is number of the Chinese characters of the longest word. Experiments show it has better results in word-segmentation.
Keywords
computational complexity; trees (mathematics); word processing; Chinese classics; automatic word segmentation; computation complexity; likelihood ratio statistics; tree-pruning algorithm; Dictionaries; Frequency; Gaussian distribution; History; Knowledge acquisition; Statistical distributions; Statistics; classics of tea; segmentation; tree-pruning;
fLanguage
English
Publisher
ieee
Conference_Titel
Knowledge Acquisition and Modeling, 2009. KAM '09. Second International Symposium on
Conference_Location
Wuhan
Print_ISBN
978-0-7695-3888-4
Type
conf
DOI
10.1109/KAM.2009.80
Filename
5362115
Link To Document