DocumentCode
2990740
Title
Speaker sampling for enhanced diversity
Author
Bernstein, Jared ; Kahn, Margaret ; Poza, Tito
Author_Institution
SRI International, Menlo Park, CA
Volume
10
fYear
1985
fDate
31138
Firstpage
1553
Lastpage
1556
Abstract
Assembling a speech data base that is both manageably small and sufficiently diverse can be a useful step in the development of speaker independent speech recognition systems. Yet there has been no data on what kind of speaker sample might be required to ensure a group whose speech includes certain phonetic or linguistic traits. The data gathered in this study suggests that some common and important dialect features will not be found even in a large number of speakers, if sampling is conducted at a single location. In order to compile a large pool of prospective speakers, 152 people were recorded for about one or two minutes speaking extemporaneously; the recordings were then rated by the three authors according to fifteen characteristics that form three classes: voice quality, manner of speaking, and dialect. Although a wide variety of voice characteristics and manners of speaking were evident among the 152 speakers, the dialect features covered a limited range. We discuss the possible causes of this distribution of characteristics in the sample and some of its implications for collecting adequate databases for speech recognition research.
Keywords
Acoustic noise; Cities and towns; Ear; Frequency; Low-frequency noise; Noise shaping; Pulse shaping methods; Sampling methods; Spectral shape; Speech;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP '85.
Type
conf
DOI
10.1109/ICASSP.1985.1168174
Filename
1168174
Link To Document