DocumentCode :
3327834
Title :
Bayesian modelling of the speech spectrum using mixture of Gaussians
Author :
Zolfaghari, Parham ; Watanabe, Shinji ; Nakamura, Atsushi ; Katagiri, Shigeru
Author_Institution :
NTT Commun. Sci. Labs, NTT Corp., Kyoto, Japan
Volume :
1
fYear :
2004
fDate :
17-21 May 2004
Abstract :
This paper presents a method for modelling the speech spectral envelope using a mixture of Gaussians (MoG). A novel variational Bayesian (VB) framework for Gaussian mixture modelling of a histogram enables the derivation of an objective function that can be used to simultaneously optimise both model parameter distributions and model structure. A histogram representation of the STRAIGHT spectral envelope, which is free of glottal excitation information, is used for parametrisation using this MoG model. This results in a parameterisation scheme that purely models the vocal tract resonant characteristics. Maximum likelihood (ML) and variational Bayesian (VB) solutions of the mixture model on histogram data are found using an iterative algorithm. A comparison between ML-MoG and VB-MoG spectral modelling is carried out using spectral distortion measures and mean opinion scores (MOS). The main advantages of VB-MoG highlighted in this paper include better modelling using fewer Gaussians in the mixture resulting in better correspondence of Gaussians and formant-like peaks, and an objective measure of the number of Gaussians required to best fit the spectral envelope.
Keywords :
Bayes methods; Gaussian distribution; iterative methods; maximum likelihood estimation; optimisation; signal representation; spectral analysis; speech processing; statistical analysis; variational techniques; Bayesian modelling; ML-MoG; MOS; STRAIGHT spectral envelope; VB-MoG; formant-like peaks; histogram representation; iterative algorithm; maximum likelihood solution; mean opinion scores; mixture of Gaussians; model parameter distributions; model structure; objective function; optimisation; parameterisation scheme; spectral distortion measures; speech spectral envelope; speech spectrum; variational Bayesian framework; vocal tract resonant characteristics; Bayesian methods; Cepstral analysis; Distortion measurement; Gaussian distribution; Gaussian processes; Histograms; Propagation losses; Resonance; Speech coding; Speech recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on
ISSN :
1520-6149
Print_ISBN :
0-7803-8484-9
Type :
conf
DOI :
10.1109/ICASSP.2004.1326045
Filename :
1326045
Link To Document :
بازگشت