DocumentCode
387962
Title
Speech analysis/Synthesis based on matching the synthesized and the original representations in the auditory nerve level
Author
Ghitza, Oded
Author_Institution
AT&T Bell Laboratories, Murray Hill, NJ, USA
Volume
11
fYear
1986
fDate
31503
Firstpage
1995
Lastpage
1998
Abstract
Traditional speech analysis/synthesis techniques are designed to produce synthesized speech with a spectrum (or waveform) which is as close as possible to the original. It is suggested, instead, to match the in-synchrony-bands spectrum measures (Ghitza, ICASSP-85, Tampa FL., Vol.2, p. 505) of the synthetic and the original speech. This concept has been used in conjunction with a sinusoidal representation type of speech analysis/synthesis (McAulay and Quatieri, Lincoln Laboratory Technical Report 693, May 1985). Based on informal listening, the resulting speech is natural (with some tonal artifact) and highly intelligible both in quiet and noisy environments. The same performance is obtained with two overlapping superposed speech waveforms, music waveforms, and speech in musical background. These results demonstrate the adequacy of the in-synchrony-bands measure in selecting the perceptually meaningful frequency regions of the stimulus spectra. Moreover, the inherent dominance property of this measure significantly reduces the number of sinusoidal components needed for synthesis by approximately 70 percent, offering the potential for reduced data-rate.
Keywords
Acoustic measurements; Frequency estimation; Frequency measurement; Frequency synchronization; Frequency synthesizers; Laboratories; Speech analysis; Speech synthesis; Vocoders; Working environment noise;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP '86.
Type
conf
DOI
10.1109/ICASSP.1986.1169191
Filename
1169191
Link To Document