Title :
Prosodic knowledge sources for automatic speech recognition
Author :
Vergyri, Dimitra ; Stolcke, Andreas ; Gadde, Venkata R R ; Ferrer, Luciana ; Shriberg, Elizabeth
Author_Institution :
Speech Technol. & Res. Lab., SRI Int., Menlo Park, CA, USA
Abstract :
In this work, different prosodic knowledge sources are integrated into a state-of-the-art large vocabulary speech recognition system. Prosody manifests itself on different levels in the speech signal: within the words as a change in phone durations and pitch, in between the words as a variation in the pause length, and beyond the words, correlating with higher linguistic structures and nonlexical phenomena. We investigate three models, each exploiting a different level of prosodic information, in rescoring N-best hypotheses according to how well recognized words correspond to prosodic features of the utterance. Experiments on the Switchboard corpus show word accuracy improvements with each prosodic knowledge source. A further improvement is observed with the combination of all models, demonstrating that they each capture somewhat different prosodic characteristics of the speech signal.
Keywords :
speech recognition; vocabulary; N-best hypotheses; Switchboard corpus; automatic speech recognition; large vocabulary speech recognition; linguistic structures; nonlexical phenomena; pause length; phone durations; prosodic knowledge sources; Automatic speech recognition; Context modeling; Energy measurement; Information resources; Laboratories; Mel frequency cepstral coefficient; Predictive models; Speech recognition; Stress; Vocabulary;
Conference_Titel :
Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003 IEEE International Conference on
Print_ISBN :
0-7803-7663-3
DOI :
10.1109/ICASSP.2003.1198753