DocumentCode :
2019774
Title :
Word Segmentation Method Based on Inductive Learning and Segmentation Rule
Author :
Wang, Zhongjian ; Araki, Kenji ; Tochinai, Koji
Author_Institution :
Harbin Univ. of Commerce, Harbin
Volume :
1
fYear :
2008
fDate :
17-18 Oct. 2008
Firstpage :
95
Lastpage :
98
Abstract :
A word segmentation method based on inductive learning for non-segmented language uses only surface information of a character string; it has an advantage that is entirely not dependent on any specific language. The method extracts recursively a character string that occur frequently in text as word candidates, extracts segmentation rule with context information to deal with segmentation ambiguity. The method classifies those extracted word candidates to different ranking according to extraction situation, segments a text into words with extracted word candidates. Though proofread process erroneous segmentation was corrected, ranking of word candidates and segmentation rules was renewed. Evaluation experiments showed availability of the method for Japanese and Chinese word segmentation.
Keywords :
learning (artificial intelligence); natural language processing; Chinese word segmentation; Japanese word segmentation; character string; inductive learning; nonsegmented language; segmentation ambiguity; segmentation rule; word segmentation method; Business; Computational intelligence; Data mining; Design methodology; Dictionaries; Frequency; Internet; Learning systems; Natural languages; Space technology; Inductive Learning; Segmentation Rule; Word Segmentation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computational Intelligence and Design, 2008. ISCID '08. International Symposium on
Conference_Location :
Wuhan
Print_ISBN :
978-0-7695-3311-7
Type :
conf
DOI :
10.1109/ISCID.2008.75
Filename :
4725565
Link To Document :
بازگشت