DocumentCode
302105
Title
Variable-order N-gram generation by word-class splitting and consecutive word grouping
Author
Masataki, Hirokazu ; Sgisaka, Yoshinori
Author_Institution
ATR Interpreting Telephony Res. Labs., Kyoto, Japan
Volume
1
fYear
1996
fDate
7-10 May 1996
Firstpage
188
Abstract
In this paper, a generation scheme for variable-order N-grams is proposed to attain reliable statistical constraints from a given language corpus. Starting from POS bigrams, the proposed scheme creates variable-order N-grams by splitting a POS into finer groups and by adding frequent consecutive word sequences as word-classes. This word-class splitting and consecutive word grouping are carried out incrementally by minimizing the total entropy. Experiments showed that the perplexity of the proposed model for the test corpus is lower than that for a conventional trigram and that this model requires a quite smaller number of statistical parameters. By applying this model to speech recognition, we get a better recognition rate than using conventional bigrams
Keywords
minimum entropy methods; natural languages; speech recognition; statistical analysis; POS bigrams; consecutive word grouping; consecutive word sequences; language corpus; perplexity; speech recognition; statistical constraints; test corpus; total entropy; variable-order N-gram generation; word-class splitting; Bellows; Data mining; Entropy; History; Probability; Smoothing methods; Speech recognition; Statistical distributions; Testing;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech, and Signal Processing, 1996. ICASSP-96. Conference Proceedings., 1996 IEEE International Conference on
Conference_Location
Atlanta, GA
ISSN
1520-6149
Print_ISBN
0-7803-3192-3
Type
conf
DOI
10.1109/ICASSP.1996.540322
Filename
540322
Link To Document