• DocumentCode
    1347555
  • Title

    Bayesian Separation With Sparsity Promotion in Perceptual Wavelet Domain for Speech Enhancement and Hybrid Speech Recognition

  • Author

    Shao, Yu ; Chang, Chip-Hong

  • Author_Institution
    Software Exploration R&D Center, China Nat. Pet. Corp., Zhuozhou, China
  • Volume
    41
  • Issue
    2
  • fYear
    2011
  • fDate
    3/1/2011 12:00:00 AM
  • Firstpage
    284
  • Lastpage
    293
  • Abstract
    Speech recognition accuracy can be improved by the removal of noise. However, errors in the estimated signal components can also obscure the recognition. This paper presents a framework of wavelet-based techniques to harness the automatic speech recognition performance in the presence of background noise. The proposed robust speech recognition system is realized by implementing speech enhancement preprocessing, feature extraction, and a hybrid speech recognizer in the time-frequency space. A perceptual wavelet filterbank using a fixed base to imitate the human perceptual modus of speech is developed to capture the most discriminative information in the time-frequency plane. To minimize the mismatch between the training and testing conditions of the classifier, a Bayesian scheme is applied in a wavelet domain to separate the speech and noise components in the proposed iterative speech enhancement algorithm. The nonphonetic information is discarded while the more critical speech features are extracted and represented by the wavelet coefficients. The denoised wavelet features are fed to the hybrid classifier founded on a hidden Markov model (HMM). The intrinsic limitation of the HMM is overcome by augmenting it with a wavelet support vector machine. This hybrid and hierarchical design paradigm improves the recognition performance by combining the advantages of different methods into an integral system. The continuous digit speech recognition experiments conducted with the proposed framework show promising results. It significantly improves the recognition performance at a low signal-to-noise ratio (SNR) without causing a poorer performance at a high SNR.
  • Keywords
    Bayes methods; hidden Markov models; speech enhancement; speech recognition; support vector machines; wavelet transforms; Bayesian separation; automatic speech recognition performance; classifier; continuous digit speech recognition; feature extraction; hidden Markov model; human perceptual modus; hybrid speech recognition; integral system; iterative algorithm; noise removal; perceptual wavelet domain; signal-to-noise ratio; sparsity promotion; speech enhancement; time-frequency space; wavelet filterbank; wavelet-based techniques; Feature extraction; Hidden Markov models; Noise; Speech; Speech enhancement; Speech recognition; Support vector machines; Bayesian theory; hidden Markov model (HMM); speech enhancement; speech recognition; support vector machine (SVM); wavelet transform;
  • fLanguage
    English
  • Journal_Title
    Systems, Man and Cybernetics, Part A: Systems and Humans, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1083-4427
  • Type

    jour

  • DOI
    10.1109/TSMCA.2010.2069094
  • Filename
    5599309