Title :
Single and Multiple
Contour Estimation Through Parametric Spectrogram Modeling of Speech in Noisy Environments
Author :
Roux, Jonathan Le ; Kameoka, Hirokazu ; Ono, Nobutaka ; De Cheveigné, Alain ; Sagayama, Shigeki
Author_Institution :
Graduate Sch. of Inf. Sci. & Technol., Univ. of Tokyo
fDate :
5/1/2007 12:00:00 AM
Abstract :
This paper proposes a novel F0 contour estimation algorithm based on a precise parametric description of the voiced parts of speech derived from the power spectrum. The algorithm is able to perform in a wide variety of noisy environments as well as to estimate the F0s of cochannel concurrent speech. The speech spectrum is modeled as a sequence of spectral clusters governed by a common F0 contour expressed as a spline curve. These clusters are obtained by an unsupervised 2-D time-frequency clustering of the power density using a new formulation of the EM algorithm, and their common F 0 contour is estimated at the same time. A smooth F0 contour is extracted for the whole utterance, linking together its voiced parts. A noise model is used to cope with nonharmonic background noise, which would otherwise interfere with the clustering of the harmonic portions of speech. We evaluate our algorithm in comparison with existing methods on several tasks, and show 1) that it is competitive on clean single-speaker speech, 2) that it outperforms existing methods in the presence of noise, and 3) that it outperforms existing methods for the estimation of multiple F0 contours of cochannel concurrent speech
Keywords :
expectation-maximisation algorithm; noise; speech processing; splines (mathematics); time-frequency analysis; 2D time-frequency clustering; EM algorithm; F0 contour estimation; cochannel concurrent speech; noisy environments; nonharmonic background noise; single-speaker speech; spectral clusters; speech parametric spectrogram modeling; spline curve; Acoustic noise; Background noise; Clustering algorithms; Hidden Markov models; Personal digital assistants; Signal processing algorithms; Speech analysis; Speech enhancement; Spline; Working environment noise; Acoustic scene analysis; expectation-maximization (EM) algorithm; harmonic-temporal structured clustering (HTC); multipitch estimation; noisy speech; spline $F_{0}$ contour;
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
DOI :
10.1109/TASL.2007.894510