Title :
Word Segmentation of Chinese Text with Multiple Hybrid Methods
Author :
Wang, Zhongjian ; Xu, Jun ; Araki, Kenji ; Tochinai, Koji
Author_Institution :
Harbin Univ. of Commerce, Harbin, China
Abstract :
To deal with unknown word and segmentation ambiguity, segmentation rules and tri-gram was used in inductive learning method. Rules were used for elementary segmentation and for better processing effectiveness in following steps. Those rules were acquired by manual labor through analyzing a tagged corpus. Inductive learning method recognized, extracted the unknown words from segmentation text recursively. The tri-gram model was used to deal with segmentation ambiguity, to select the better segmentation candidate by calculating a sentence probability. Experimental results indicated that unknown words processing and segmentation error were improved.
Keywords :
learning (artificial intelligence); word processing; Chinese text; inductive learning method; multiple hybrid methods; trigram model; word segmentation; words processing; Business; Computer errors; Dictionaries; Internet; Learning systems; Natural languages; Probability; Statistics; Text processing; Text recognition;
Conference_Titel :
Computational Intelligence and Software Engineering, 2009. CiSE 2009. International Conference on
Conference_Location :
Wuhan
Print_ISBN :
978-1-4244-4507-3
Electronic_ISBN :
978-1-4244-4507-3
DOI :
10.1109/CISE.2009.5363684