DocumentCode :
166245
Title :
Speech re-synthesis from spectrogram image through sinusoidal modelling
Author :
Garg, Mayank ; Singhal, Roshani
Author_Institution :
Electr. & Electron. Dept., Birla Inst. of Technol. & Sci., Pilani, India
fYear :
2014
fDate :
24-27 Sept. 2014
Firstpage :
2757
Lastpage :
2761
Abstract :
A novel method to extract parameters i.e. frequencies and their bandwidth for intelligible speech synthesis is presented in the paper. The parameters are extracted from the spectrogram image of the pre-recorded male and female voice samples and used to re-synthesize speech by employing sinusoidal signals. The phase continuity is preserved by quantifying time-scale and identifying phase at temporal boundaries for a given frequency. The amplitude distribution of the sinusoidals follow Gaussian distribution and use frequency overlap to extend the bandwidth from 4 kHz to 6 kHz for the improvement in clarity of synthesized speech. The synthesized speech is further passed through a weighting filter to improve the envelope of re-synthesized time-domain signal. The synthesized speech is synthetic but noticeably intelligible.
Keywords :
Gaussian distribution; filtering theory; speech synthesis; time-domain analysis; Gaussian distribution; amplitude distribution; frequency 6 kHz; frequency overlap; intelligible speech synthesis; parameter extraction; phase continuity; sinusoidal modelling; sinusoidal signals; spectrogram image; speech resynthesis; time-domain signal resynthesis; time-scale quantification; weighting filter; Bayes methods; Gaussian filter; intelligible speech synthesis; parameter extraction; sinusoidal synthesis; synthetic speech;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Advances in Computing, Communications and Informatics (ICACCI, 2014 International Conference on
Conference_Location :
New Delhi
Print_ISBN :
978-1-4799-3078-4
Type :
conf
DOI :
10.1109/ICACCI.2014.6968501
Filename :
6968501
Link To Document :
بازگشت