Title :
Phrase bigrams for continuous speech recognition
Author :
Giachin, Egidio P.
Author_Institution :
CSELT, Torino, Italy
Abstract :
In some speech recognition tasks, such as man-machine dialogue systems, the spoken sentences include several recurrent phrases. A bigram language model does not adequately represent these phrases because it underestimates their probability. A better approach consists of modeling phrases as if they were individual dictionary elements. They we inserted as additional entries into the word lexicon, on which bigrams are finally computed. This paper discusses two procedures for automatically determining frequent phrases (within the framework of a probabilistic language model) in an unlabeled training set of written sentences. One procedure is optimal since it minimises the set perplexity. The other, based on information theoretic criteria, insures that the resulting model has a high statistical robustness. The two procedures are tested on a 762-word spontaneous speech recognition task. They give similar results and provide a moderate improvement over standard bigrams
Keywords :
grammars; information theory; interactive systems; natural languages; optimisation; probability; speech processing; speech recognition; statistical analysis; bigram language model; continuous speech recognition; dictionary elements; frequent phrases; information theoretic criteria; man-machine dialogue systems; phrase bigrams; probabilistic language model; recurrent phrases; set perplexity minimisation; spoken sentences; spontaneous speech recognition; statistical robustness; unlabeled training set; word lexicon; written sentences; Dictionaries; Humans; Laboratories; Man machine systems; Robustness; Spatial databases; Speech recognition; Telecommunications; Telephony; Testing;
Conference_Titel :
Acoustics, Speech, and Signal Processing, 1995. ICASSP-95., 1995 International Conference on
Conference_Location :
Detroit, MI
Print_ISBN :
0-7803-2431-5
DOI :
10.1109/ICASSP.1995.479405