Title :
Speech analysis/Synthesis based on matching the synthesized and the original representations in the auditory nerve level
Author_Institution :
AT&T Bell Laboratories, Murray Hill, NJ, USA
Abstract :
Traditional speech analysis/synthesis techniques are designed to produce synthesized speech with a spectrum (or waveform) which is as close as possible to the original. It is suggested, instead, to match the in-synchrony-bands spectrum measures (Ghitza, ICASSP-85, Tampa FL., Vol.2, p. 505) of the synthetic and the original speech. This concept has been used in conjunction with a sinusoidal representation type of speech analysis/synthesis (McAulay and Quatieri, Lincoln Laboratory Technical Report 693, May 1985). Based on informal listening, the resulting speech is natural (with some tonal artifact) and highly intelligible both in quiet and noisy environments. The same performance is obtained with two overlapping superposed speech waveforms, music waveforms, and speech in musical background. These results demonstrate the adequacy of the in-synchrony-bands measure in selecting the perceptually meaningful frequency regions of the stimulus spectra. Moreover, the inherent dominance property of this measure significantly reduces the number of sinusoidal components needed for synthesis by approximately 70 percent, offering the potential for reduced data-rate.
Keywords :
Acoustic measurements; Frequency estimation; Frequency measurement; Frequency synchronization; Frequency synthesizers; Laboratories; Speech analysis; Speech synthesis; Vocoders; Working environment noise;
Conference_Titel :
Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP '86.
DOI :
10.1109/ICASSP.1986.1169191