DocumentCode
542316
Title
Automatic new word extraction method
Author
Shi, Qin ; Shen, Li Qin ; Chai, Hai Xin
Author_Institution
IBM China Research Laboratory, China
Volume
1
fYear
2002
fDate
13-17 May 2002
Abstract
New words are very difficult to be extracted automatically for those languages where there is no word boundary in written texts, such as Chinese, Japanese etc. In this paper, we present a Statistical method to extract new words from a large amount of corpus with no word boundary. Based on Generalized Suffix Tree (GST) data structure we define NWP (New Word Pattern) and SBP (Segmentation Boundary Pattern) to separate input strings into small pieces, and offer a practical and efficient algorithm to get the proper words from GST.
Keywords
Manuals;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech, and Signal Processing (ICASSP), 2002 IEEE International Conference on
Conference_Location
Orlando, FL, USA
ISSN
1520-6149
Print_ISBN
0-7803-7402-9
Type
conf
DOI
10.1109/ICASSP.2002.5743876
Filename
5743876
Link To Document