DocumentCode :
1615878
Title :
Kana-to-kanji conversion method using Markov chain model of words in bunsetsu
Author :
Kato, Shozo ; Araki, Chikahiro ; Hashimukai, Shinichiro ; Ogoshi, Yasuhiro ; Mori, Mikio ; Taniguchi, Shuji
Author_Institution :
Dept. of Electron. & Inf. Eng., Fukui Nat. Coll. of Technol., Sabae, Japan
fYear :
2010
Firstpage :
154
Lastpage :
160
Abstract :
We previously proposed a kana-to-kanji conversion method of non-segmented kana sentences by using Markov chain model of words in sentence. However, we could not obtain the enough accuracy rate for conversion by this method. The cause is considered that the total number of the rules is not saturated in the dictionary of Markov chain probabilities of words in sentence. Therefore, we take notice that the total number of the rules is almost saturated in the dictionary of Markov chain probabilities of words in bunsetsu. In this paper, we propose a new kana-to-kanji conversion method by using this Markov chain model. That is, the new proposed method detects simultaneously the boundaries of kana bunsetsu in sentence and the boundaries of kana word in bunsetsu by using Markov chain model of kana words in bunsetsu, and then converts kana words to the candidates of kanji-kana word and selects the maximum likely candidate by using Markov chain model of kanji-kana words in bunsetsu. Through the experiments by using statistical data of daily Japanese newspaper, the previous proposed method (called Method-B1) and the new proposed method (called Method-B2) are evaluated by the criteria of the accuracy rate for conversion. From the results of the experiments, it is concluded that Method-B2 is superior to Method-B1 in the accuracy rate for conversion and is effective in kana-to-kanji conversion of non-segmented kana sentences.
Keywords :
Markov processes; natural language processing; statistical analysis; text analysis; Japanese newspaper; Kana-to-kanji conversion method; Markov chain model; Markov chain model of words; Markov chain probabilities; Method-B1; Method-B2; conversion accuracy rate; kana bunsetsu; kanji-kana word; nonsegmented kana sentence; statistical data; Accuracy; Data models; Dictionaries; Markov processes; Probabilistic logic; Probability; Viterbi algorithm; Markov chain model of words in bunsetsu; detecting boundaries of kana words; kana-to-kanji conversion; non-segmented kana sentence; probabilistic language model;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Universal Communication Symposium (IUCS), 2010 4th International
Conference_Location :
Beijing
Print_ISBN :
978-1-4244-7821-7
Type :
conf
DOI :
10.1109/IUCS.2010.5666652
Filename :
5666652
Link To Document :
بازگشت