Title :
Language modeling by string pattern N-gram for Japanese speech recognition
Author :
Ito, Akinori ; Kohda, Masaki
Author_Institution :
Yamagata Univ., Yonezawa, Japan
Abstract :
This paper describes a new powerful statistical language model based on N-gram model for Japanese speech recognition. In English, a sentence is written word-by-word. On the other hand. A sentence in Japanese has no word boundary character. Therefore. A Japanese sentence requires word segmentation by morphemic analysis before the construction of word N-gram. We propose an N-gram based language model which requires no word segmentation. This model uses character string patterns as units of N-gram. The string patterns are chosen from the training text according to a statistical criterion. We carried out several experiments to compare perplexities of the proposed and the conventional models. which showed the advantage of our model. For many of the readers´ interest, we applied this method to English text. As the result of a preliminary experiment, the proposed method got better performance than conventional word trigram
Keywords :
natural languages; speech recognition; Japanese speech recognition; conventional word trigram; language modeling; morphemic analysis; statistical language model; string pattern N-gram; word segmentation; Dictionaries; Information analysis; Information retrieval; Natural language processing; Natural languages; Probability; Speech analysis; Speech recognition; Spread spectrum communication; Testing;
Conference_Titel :
Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on
Conference_Location :
Philadelphia, PA
Print_ISBN :
0-7803-3555-4
DOI :
10.1109/ICSLP.1996.607161