• DocumentCode
    312028
  • Title

    Language modeling by string pattern N-gram for Japanese speech recognition

  • Author

    Ito, Akinori ; Kohda, Masaki

  • Author_Institution
    Yamagata Univ., Yonezawa, Japan
  • Volume
    1
  • fYear
    1996
  • fDate
    3-6 Oct 1996
  • Firstpage
    490
  • Abstract
    This paper describes a new powerful statistical language model based on N-gram model for Japanese speech recognition. In English, a sentence is written word-by-word. On the other hand. A sentence in Japanese has no word boundary character. Therefore. A Japanese sentence requires word segmentation by morphemic analysis before the construction of word N-gram. We propose an N-gram based language model which requires no word segmentation. This model uses character string patterns as units of N-gram. The string patterns are chosen from the training text according to a statistical criterion. We carried out several experiments to compare perplexities of the proposed and the conventional models. which showed the advantage of our model. For many of the readers´ interest, we applied this method to English text. As the result of a preliminary experiment, the proposed method got better performance than conventional word trigram
  • Keywords
    natural languages; speech recognition; Japanese speech recognition; conventional word trigram; language modeling; morphemic analysis; statistical language model; string pattern N-gram; word segmentation; Dictionaries; Information analysis; Information retrieval; Natural language processing; Natural languages; Probability; Speech analysis; Speech recognition; Spread spectrum communication; Testing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on
  • Conference_Location
    Philadelphia, PA
  • Print_ISBN
    0-7803-3555-4
  • Type

    conf

  • DOI
    10.1109/ICSLP.1996.607161
  • Filename
    607161