Title :
Automatically finding semantically consistent n-grams to add new words in LVCSR systems
Author :
Lecorvé, Gwénolé ; Gravier, Guillaume ; Sébillot, Pascale
Author_Institution :
IRISA, Rennes, France
Abstract :
This paper presents a new method to automatically add re-grams containing out-of-vocabulary (OOV) words to a baseline language model (LM), where these re-grams are sought to be grammatically correct and to make sense according to the meaning of OOV words. First, this method consists in determining the word sequences, i.e., re-grams, in which the usage of a given OOV word is the most semantically consistent. Then, conditional probabilities of these re-grams have to be computed. To do this, semantic relations between words are used to assimilate each OOV word to several equivalent in vocabulary words. Based on these last words, n-grams from the baseline LM are re-used to find the word sequences to be added and to compute their probabilities. After augmenting the vocabulary and launching a recognition process, experiments show that our method results in WER improvements which are comparable to those obtained using a state-of-the-art open vocabulary LM.
Keywords :
natural language processing; probability; speech recognition; vocabulary; LVCSR systems; baseline language model; consistent n-grams; large vocabulary continuous speech recognition systems; natural language processing; open vocabulary LM; out-of-vocabulary words; word sequences; Adaptation models; Context; History; Semantics; Speech; Speech recognition; Vocabulary; Automatic speech recognition; language modeling; natural language processing; vocabulary adaptation;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on
Conference_Location :
Prague
Print_ISBN :
978-1-4577-0538-0
Electronic_ISBN :
1520-6149
DOI :
10.1109/ICASSP.2011.5947398