Title :
A robust language model incorporating a substring parser and extended n-grams
Author :
Wright, J.H. ; Jones, G.J.F. ; Lloyd-Thomas, H.
Author_Institution :
Centre for Commun. Res., Bristol Univ., UK
Abstract :
Describes a language model for speech recognition which incorporates a substring parser (to take advantage of syntactic structure covered by a context-free grammar) and extended bigrams (to take advantage of remote dependencies between words). The use of extended bigrams significantly reduces the perplexity and a distribution clustering algorithm alleviates the additional storage cost. The substring parser is the foundation for training and scoring procedures based on paths at all levels through the syntactic structures, with subtrees linked by bigrams. The word bigram score is therefore absorbed into a grammar framework, consolidating the two kinds of language model, and again a significant reduction in perplexity is observed. The aim is an integrated, robust language model that is adaptive to the speaker
Keywords :
computational linguistics; context-free grammars; learning (artificial intelligence); natural languages; speech recognition; context-free grammar; distribution clustering algorithm; extended bigrams; extended n-grams; grammar framework; perplexity; remote dependencies; robust language model; scoring; speech recognition; substring parser; syntactic structure; training; word bigram score; Buildings; Clustering algorithms; Context modeling; Costs; Natural languages; Positrons; Probability; Robustness; Speech recognition; Vocabulary;
Conference_Titel :
Acoustics, Speech, and Signal Processing, 1994. ICASSP-94., 1994 IEEE International Conference on
Conference_Location :
Adelaide, SA
Print_ISBN :
0-7803-1775-0
DOI :
10.1109/ICASSP.1994.389281