DocumentCode :
312114
Title :
The influence of bigram constraints on word recognition by humans: implications for computer speech recognition
Author :
Cole, Ronald A. ; Yan, Yonghong ; Bailey, Troy
Author_Institution :
Center for Spoken Language Understanding, Oregon Graduate Inst. of Sci. & Technol., Beaverton, OR, USA
Volume :
2
fYear :
1996
fDate :
3-6 Oct 1996
Firstpage :
829
Abstract :
The gap between human and machine performance on speech recognition tasks is still very large. Recognition of words in telephone conversations is slightly better than 50%, based on results reported on the Switchboard corpus by leading researchers using state of the art HMM systems. We know from our own experience that human perception typically delivers much more accurate word recognition over the telephone. Why is there such a large gap between machine and human performance, and what can be done to dose this gap? One way to address this question is to study the sources of linguistic information in the speech signal that are known to be important for word recognition, and measure how well machine systems utilize this information relative to humans. We measured word recognition performance of listeners presented with words from the Switchboard corpus. Stimuli consisted of actual utterances excised from the Switchboard corpus, high quality recordings of utterances that occurred in Switchboard conversations, and recordings of word sequences with zero, medium and high bigram probabilities based on a language model computed from transcriptions of the Switchboard corpus. The results show that human listeners are very good at recognizing words in the absence of word sequence constraints, and that statistical language models fail to capture much of the high level linguistic information needed to recognize words in fluent speech. The results are discussed in terms of their implications to current approaches to acoustic and language modeling in computer speech recognition
Keywords :
human factors; natural language interfaces; speech recognition; word processing; HMM systems; Switchboard conversations; Switchboard corpus; bigram constraints; bigram probabilities; computer speech recognition; fluent speech; high level linguistic information; human perception; human performance; linguistic information; machine performance; machine systems; speech recognition tasks; speech signal; statistical language models; telephone conversations; utterances; word recognition; word recognition performance; word sequences; Degradation; Error analysis; Guns; Hidden Markov models; Humans; Natural languages; Probability; Speech recognition; Telephony; Training data;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on
Conference_Location :
Philadelphia, PA
Print_ISBN :
0-7803-3555-4
Type :
conf
DOI :
10.1109/ICSLP.1996.607729
Filename :
607729
Link To Document :
بازگشت