Title :
Enhancing the Intelligibility of Statistically Generated Synthetic Speech by Means of Noise-Independent Modifications
Author :
Erro, Daniel ; Zorila, Tudor-Catalin ; Stylianou, Yannis
Author_Institution :
Aholab, Univ. of the Basque Country(UPV/EHU), Bilbao, Spain
Abstract :
When speaking devices such as smartphones, tablet-PCs, or GPS systems are used in noisy outdoor environments, the intelligibility of speech significantly drops. This is even more pronounced when synthetic speech is used. This article describes how a statistical parametric speech synthesis system trained on an ordinary synthesis database can be designed to generate highly intelligible speech, even at very low signal-to-noise ratios. By using a simple and flexible vocoder based on a full-band harmonic model, the proposed system applies deterministic noise-independent modifications at several levels: speaking rate, average fundamental frequency level and range, energy contour over time, formant sharpness, and intensity of specific spectral bands. The degree of intelligibility achieved by the system has been evaluated by means of a large-scale subjective test, the results of which show that the suggested approach clearly outperforms a reference state-of-the-art TTS system and also unmodified natural speech in some conditions. In comparison with alternative systems evaluated in the same framework, the proposed one exhibits the best performance in the scenarios with lowest signal-to-noise ratio. Finally, the impact of the suggested modifications on naturalness, quality and similarity to the original natural voice is quantified by means of a subjective test.
Keywords :
speech synthesis; vocoders; GPS systems; TTS system; energy contour; full-band harmonic model; highly intelligible speech; intelligibility; natural voice; noise-independent modifications; ordinary synthesis database; signal-to-noise ratios; smartphones; spectral bands; statistical parametric speech synthesis system; statistically generated synthetic speech; subjective test; tablet-PC; unmodified natural speech; vocoder; Harmonic analysis; IEEE transactions; Noise; Speech; Speech enhancement; Vocoders; Harmonic model; speech enhancement; speech intelligibility in noise; statistical parametric speech synthesis; voice transformation;
Journal_Title :
Audio, Speech, and Language Processing, IEEE/ACM Transactions on
DOI :
10.1109/TASLP.2014.2361022