Language modeling by string pattern N-gram for Japanese speech recognition

Author

Ito, Akinori ; Kohda, Masaki

Author_Institution

Yamagata Univ., Yonezawa, Japan

Volume

1

fYear

1996

fDate

3-6 Oct 1996

Firstpage

490

Abstract

This paper describes a new powerful statistical language model based on N-gram model for Japanese speech recognition. In English, a sentence is written word-by-word. On the other hand. A sentence in Japanese has no word boundary character. Therefore. A Japanese sentence requires word segmentation by morphemic analysis before the construction of word N-gram. We propose an N-gram based language model which requires no word segmentation. This model uses character string patterns as units of N-gram. The string patterns are chosen from the training text according to a statistical criterion. We carried out several experiments to compare perplexities of the proposed and the conventional models. which showed the advantage of our model. For many of the readers´ interest, we applied this method to English text. As the result of a preliminary experiment, the proposed method got better performance than conventional word trigram

Keywords

natural languages; speech recognition; Japanese speech recognition; conventional word trigram; language modeling; morphemic analysis; statistical language model; string pattern N-gram; word segmentation; Dictionaries; Information analysis; Information retrieval; Natural language processing; Natural languages; Probability; Speech analysis; Speech recognition; Spread spectrum communication; Testing;

fLanguage

English

Publisher

ieee

Conference_Titel

Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on

Conference_Location

Philadelphia, PA

Print_ISBN

0-7803-3555-4

Type

conf

DOI

10.1109/ICSLP.1996.607161

Filename

607161