Speaker sampling for enhanced diversity

Author

Bernstein, Jared ; Kahn, Margaret ; Poza, Tito

Author_Institution

SRI International, Menlo Park, CA

Volume

10

fYear

1985

fDate

31138

Firstpage

1553

Lastpage

1556

Abstract

Assembling a speech data base that is both manageably small and sufficiently diverse can be a useful step in the development of speaker independent speech recognition systems. Yet there has been no data on what kind of speaker sample might be required to ensure a group whose speech includes certain phonetic or linguistic traits. The data gathered in this study suggests that some common and important dialect features will not be found even in a large number of speakers, if sampling is conducted at a single location. In order to compile a large pool of prospective speakers, 152 people were recorded for about one or two minutes speaking extemporaneously; the recordings were then rated by the three authors according to fifteen characteristics that form three classes: voice quality, manner of speaking, and dialect. Although a wide variety of voice characteristics and manners of speaking were evident among the 152 speakers, the dialect features covered a limited range. We discuss the possible causes of this distribution of characteristics in the sample and some of its implications for collecting adequate databases for speech recognition research.

Keywords

Acoustic noise; Cities and towns; Ear; Frequency; Low-frequency noise; Noise shaping; Pulse shaping methods; Sampling methods; Spectral shape; Speech;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP '85.

Type

conf

DOI

10.1109/ICASSP.1985.1168174

Filename

1168174