Title :
Bayesian Separation With Sparsity Promotion in Perceptual Wavelet Domain for Speech Enhancement and Hybrid Speech Recognition
Author :
Shao, Yu ; Chang, Chip-Hong
Author_Institution :
Software Exploration R&D Center, China Nat. Pet. Corp., Zhuozhou, China
fDate :
3/1/2011 12:00:00 AM
Abstract :
Speech recognition accuracy can be improved by the removal of noise. However, errors in the estimated signal components can also obscure the recognition. This paper presents a framework of wavelet-based techniques to harness the automatic speech recognition performance in the presence of background noise. The proposed robust speech recognition system is realized by implementing speech enhancement preprocessing, feature extraction, and a hybrid speech recognizer in the time-frequency space. A perceptual wavelet filterbank using a fixed base to imitate the human perceptual modus of speech is developed to capture the most discriminative information in the time-frequency plane. To minimize the mismatch between the training and testing conditions of the classifier, a Bayesian scheme is applied in a wavelet domain to separate the speech and noise components in the proposed iterative speech enhancement algorithm. The nonphonetic information is discarded while the more critical speech features are extracted and represented by the wavelet coefficients. The denoised wavelet features are fed to the hybrid classifier founded on a hidden Markov model (HMM). The intrinsic limitation of the HMM is overcome by augmenting it with a wavelet support vector machine. This hybrid and hierarchical design paradigm improves the recognition performance by combining the advantages of different methods into an integral system. The continuous digit speech recognition experiments conducted with the proposed framework show promising results. It significantly improves the recognition performance at a low signal-to-noise ratio (SNR) without causing a poorer performance at a high SNR.
Keywords :
Bayes methods; hidden Markov models; speech enhancement; speech recognition; support vector machines; wavelet transforms; Bayesian separation; automatic speech recognition performance; classifier; continuous digit speech recognition; feature extraction; hidden Markov model; human perceptual modus; hybrid speech recognition; integral system; iterative algorithm; noise removal; perceptual wavelet domain; signal-to-noise ratio; sparsity promotion; speech enhancement; time-frequency space; wavelet filterbank; wavelet-based techniques; Feature extraction; Hidden Markov models; Noise; Speech; Speech enhancement; Speech recognition; Support vector machines; Bayesian theory; hidden Markov model (HMM); speech enhancement; speech recognition; support vector machine (SVM); wavelet transform;
Journal_Title :
Systems, Man and Cybernetics, Part A: Systems and Humans, IEEE Transactions on
DOI :
10.1109/TSMCA.2010.2069094