Title :
How humans perform on a connected-digits data base
Author_Institution :
Institute for Perception TNO, Soesterberg, The Netherlands
Abstract :
Participating members of the international NATO Research Study Group RSG-10 on Speech Processing are presently using a data base of connected digits, spoken in different languages, to facilitate comparison of (connected) word recognition systems in the various countries. In order to be able to refer "system" results to human performance, we executed a listening experiment with a representative subset of the same recordings of connected digits. Four Dutch subjects listened to connected 3-to-5 digit groups, as well as to isolated digits, spoken in English and in Dutch. The English material was spoken by 4 native and 6 nonnative speakers of English. Apart from an undisturbed condition, subjects also identified the digit sequences in two noise conditions with speech-to-noise ratios of -3 and -9 dB. At SNR = -3 dB the listeners still do an excellent job. There is substantial speaker variation, but no systematic effect of language, sex, or native vs non-native speakers. Subjects showed a prolonged learning effect, and were especially sensitive to tempo under more difficult (noisy) listening conditions.
Keywords :
Automatic speech recognition; Error analysis; Feedback; Humans; Natural languages; Signal to noise ratio; Speech analysis; Speech recognition; System testing; Vocabulary;
Conference_Titel :
Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP '82.
DOI :
10.1109/ICASSP.1982.1171874