Title :
Two-step generation of variable-word-length language model integrating local and global constraints
Author :
Matsunaga, Shoichi ; Sagayama, Shigeki
Author_Institution :
NTT Human Interface Labs., Kanagawa, Japan
Abstract :
This paper proposes two-step generation of a variable-length class-based language model that integrates local and global constraints. In the first-step, an initial class set is recursively designed using local constraints. Word elements for each class are determined using Kullback divergence and total entropy. In the second step, the word classes are recursively and words are iteratively recreated, by grouping consecutive words to generate longer units and by splitting the initial classes into finer classes. These operations in the second step are carried out selectively, taking into account local and global constraints on the basis of a minimum entropy criterion. Experiments showed that the perplexity of the proposed initial class set is superior to that of the conventional part-of-speech class, and the perplexity of the variable-word-length model consequently becomes lower. Furthermore, this two-step model generation approach greatly reduces the training time
Keywords :
grammars; iterative methods; minimum entropy methods; natural languages; speech processing; speech recognition; Kullback divergence; class-based language model; experiments; global constraints; iterative word-class; large vocabulary continuous speech recognition; local constraints; minimum entropy criterion; part-of-speech class; perplexity; total entropy; training time reduction; two-step model generation; variable-word-length language model; Broadcasting; Entropy; Gratings; Humans; Natural languages; Power generation; Speech recognition; Testing; Vocabulary;
Conference_Titel :
Acoustics, Speech and Signal Processing, 1998. Proceedings of the 1998 IEEE International Conference on
Conference_Location :
Seattle, WA
Print_ISBN :
0-7803-4428-6
DOI :
10.1109/ICASSP.1998.675360