DocumentCode :
290055
Title :
A robust language model incorporating a substring parser and extended n-grams
Author :
Wright, J.H. ; Jones, G.J.F. ; Lloyd-Thomas, H.
Author_Institution :
Centre for Commun. Res., Bristol Univ., UK
Volume :
i
fYear :
1994
fDate :
19-22 Apr 1994
Abstract :
Describes a language model for speech recognition which incorporates a substring parser (to take advantage of syntactic structure covered by a context-free grammar) and extended bigrams (to take advantage of remote dependencies between words). The use of extended bigrams significantly reduces the perplexity and a distribution clustering algorithm alleviates the additional storage cost. The substring parser is the foundation for training and scoring procedures based on paths at all levels through the syntactic structures, with subtrees linked by bigrams. The word bigram score is therefore absorbed into a grammar framework, consolidating the two kinds of language model, and again a significant reduction in perplexity is observed. The aim is an integrated, robust language model that is adaptive to the speaker
Keywords :
computational linguistics; context-free grammars; learning (artificial intelligence); natural languages; speech recognition; context-free grammar; distribution clustering algorithm; extended bigrams; extended n-grams; grammar framework; perplexity; remote dependencies; robust language model; scoring; speech recognition; substring parser; syntactic structure; training; word bigram score; Buildings; Clustering algorithms; Context modeling; Costs; Natural languages; Positrons; Probability; Robustness; Speech recognition; Vocabulary;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech, and Signal Processing, 1994. ICASSP-94., 1994 IEEE International Conference on
Conference_Location :
Adelaide, SA
ISSN :
1520-6149
Print_ISBN :
0-7803-1775-0
Type :
conf
DOI :
10.1109/ICASSP.1994.389281
Filename :
389281
Link To Document :
بازگشت