Title :
Pitch estimation from speech using Grating Compression Transform on Modified Group-Delay-gram
Author :
Sebastian, Jilt ; Manoj Kumar, P.A. ; Murthy, Hema A.
Author_Institution :
Indian Inst. of Technol. Madras, Chennai, India
fDate :
Feb. 27 2015-March 1 2015
Abstract :
This work presents an approach for pitch extraction based on Grating Compression Transform (GCT) on harmonically-enhanced Modified Group-Delay-gram (Modgdgram). The work explores the use of peakedness and high resolution properties of the group delay functions and the ability of GCT to smear harmonically related components in the spectrum and to track pitch across frames. The power spectrum of the signal is divided by a cepstrally smoothened version of the power spectrum to obtain flattened spectrum. Owing to the picket-fence harmonics due to pitch in the flattened spectrum, the spectrum resembles a sinusoid that is corrupted by noise. This signal is treated as a sinusoidal signal and modified group delay based analysis is performed. Localized time-frequency regions of Modgdgram are used for GCT computation. Peak picking is performed on the resulting rate-scale domain and pitch dynamics are used to finalize the pitch values. The proposed algorithm without any post processing is compared with the traditional GCT computed on the magnitude spectrum and the modified group delay alone. Both natural and synthetic speech are considered for evaluation and an overall improvement of 27% is obtained in the error measures. Finally, two commonly used advanced algorithms which include post processing steps are also considered and the results obtained are comparable.
Keywords :
data compression; speech coding; time-frequency analysis; GCT computation; Modgdgram; cepstrally-smoothened version; error measures; flattened spectrum; grating compression transform; group delay functions; harmonically-enhanced modified group-delay-gram; harmonically-related components; high-resolution properties; localized time-frequency regions; magnitude spectrum; modified group delay-based analysis; natural speech; peak picking; picket-fence harmonics; pitch dynamics; pitch estimation; pitch extraction; pitch values; rate-scale domain; signal power spectrum; sinusoidal signal; synthetic speech; Delays; Estimation; Fourier transforms; Gratings; Spectrogram; Speech; Time-frequency analysis;
Conference_Titel :
Communications (NCC), 2015 Twenty First National Conference on
Conference_Location :
Mumbai
DOI :
10.1109/NCC.2015.7084899