Title :
An evaluation of statistical language modeling for speech recognition using a mixed category of both words and parts-of-speech
Author :
Wakita, Yumi ; Kawai, Jun ; Iida, Hitoshi
Author_Institution :
ATR Interpreting Telecommun. Res. Lab., Kyoto, Japan
Abstract :
In our previous paper, we proposed a mixed category of words and parts-of-speech names of the MWP category based on class N-gram modeling (Kawai et al., 1995). However, we had not confirmed the efficiency of the MWP category. In this paper, we evaluate the proposed MWP category. At first we use “coverage of words and category sequences to open data” and “perplexity to training data” for the evaluation and we confirmed that the characteristics of parts-of-speech are useful for generating a suitable class N-gram modeling. As a result of the speech recognition experimentation, we also confirmed that the class N-gram modeling using the MWP category is effective in improving the recognition rate for open data that shows a low coverage of words and category sequences, without decreasing the recognition rate much for closed data
Keywords :
computational linguistics; natural languages; probability; speech recognition; statistical analysis; MWP category; category sequences; class N-gram modeling; closed data; parts-of-speech; probability; speech recognition; statistical language modeling; words; Character generation; Computer aided analysis; Entropy; Frequency; Natural languages; Predictive models; Smoothing methods; Speech recognition; Testing; Training data;
Conference_Titel :
Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on
Conference_Location :
Philadelphia, PA
Print_ISBN :
0-7803-3555-4
DOI :
10.1109/ICSLP.1996.607171