DocumentCode :
3246278
Title :
Semantics synchronous understanding for robust spoken language applications
Author :
Wang, Kuansan
Author_Institution :
Speech Technol. Group, Microsoft Res., Redmond, WA, USA
fYear :
2003
fDate :
30 Nov.-3 Dec. 2003
Firstpage :
640
Lastpage :
645
Abstract :
In this paper, we describe our recent effort in combining speech recognition and understanding into a single pass decoding process. The goal is to utilize the semantic structure not only to better handle disfluencies and improve the overall understanding accuracy, but also to shorten the response time and achieve higher interactivity. Three related techniques are instrumental in our approach. First, we employ the unified language model (ULM) to incorporate semantic schema into the recognition language model, and extend the search process from word synchronous to semantic object synchronous (SOS) decoding. Finally, we utilize sequential detection to defer, reject, or accept semantic hypotheses and execute consequent dialog actions while the user´s utterance is ongoing. We incorporated these methods into SALT and HTML and conducted comparative user studies based on the MiPad scenarios. The experimental results show the system can gracefully cope with spontaneous speech and the users prefer the highly interactive nature of such systems even though there are no significant differences in the task completion rate and the understanding accuracy. However, the interactive interface does allow a more effective visual prompting strategy that contributes to the significantly lower out of grammar utterances.
Keywords :
linguistics; sequential estimation; speech recognition; speech-based user interfaces; HTML; SALT; ULM; disfluencies; interactive user interface; interactivity; out of grammar utterances; search process; semantic hypotheses; semantic object synchronous decoding; semantics synchronous understanding; sequential detection; single pass decoding process; speech recognition; spoken language understanding system; spontaneous speech; task completion rate; understanding accuracy; unified language model; visual prompting strategy; Acoustic applications; Acoustic waves; Automatic speech recognition; Decoding; Delay; Natural languages; Pattern recognition; Robustness; Speech processing; Speech recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Automatic Speech Recognition and Understanding, 2003. ASRU '03. 2003 IEEE Workshop on
Print_ISBN :
0-7803-7980-2
Type :
conf
DOI :
10.1109/ASRU.2003.1318515
Filename :
1318515
Link To Document :
بازگشت