Title :
Introducing linguistic constraints into statistical language modeling
Author_Institution :
Karlsruhe Univ., Germany
Abstract :
Building robust stochastic language models is a major issue in speech recognition systems. Conventional word-based n-gram models do not capture any linguistic constraints inherent in speech. In this paper, the notion of function and content words (open/closed word classes) is used to provide linguistic knowledge that can be incorporated into language models. Function words are articles, prepositions and personal pronouns. Content words are nouns, verbs, adjectives and adverbs. Based on this class definition resulting in function and content word markers, a new language model is defined. A combination of the word-based model with this new model is introduced. The combined model shows modest improvements both in perplexity results and recognition performance
Keywords :
grammars; linguistics; natural languages; speech recognition; statistics; stochastic processes; adjectives; adverbs; articles; closed word classes; content words; function words; linguistic constraints; nouns; open word classes; perplexity; personal pronouns; prepositions; recognition performance; robust stochastic language models; speech recognition systems; statistical language modeling; verbs; word markers; word-based n-gram models; Databases; History; Interactive systems; Laboratories; Natural languages; Predictive models; Robustness; Speech recognition; Stochastic processes; Stochastic systems;
Conference_Titel :
Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on
Conference_Location :
Philadelphia, PA
Print_ISBN :
0-7803-3555-4
DOI :
10.1109/ICSLP.1996.607139