• DocumentCode
    1492222
  • Title

    Constrained Iterative Speech Enhancement Using Phonetic Classes

  • Author

    Das, Amit ; Hansen, John H L

  • Author_Institution
    Dept. of Electr. Eng., Univ. of Texas at Dallas, Richardson, TX, USA
  • Volume
    20
  • Issue
    6
  • fYear
    2012
  • Firstpage
    1869
  • Lastpage
    1883
  • Abstract
    The degree of influence of noise over phonemes is not uniform since it is dependent on their distinct acoustic properties. In this study, the problem of selectively enhancing speech based on broad phoneme classes is addressed using Auto-(LSP), a constrained iterative speech enhancement algorithm. Multiple enhanced utterances are generated for every noisy utterance by varying the Auto-LSP parameters. The noisy utterance is then partitioned into segments based on broad level phoneme classes, and constraints are applied on each segment using a hard decision solution. To alleviate the effect of hard decision errors, a Gaussian mixture model (GMM)-based maximum-likelihood (ML) soft decision solution is also presented. The resulting utterances are evaluated over the TIMIT speech corpus using the Itakura-Saito, segmental signal-to-noise ratio (SNR) and perceptual evaluation of speech quality (PESQ) metrics over four noise types at three SNR levels. Comparative assessment over baseline enhancement algorithms like Auto-LSP, log-minimum mean squared error (log-MMSE), and log-MMSE with speech presence uncertainty (log-MMSE-SPU) demonstrate that the proposed solution exhibits greater consistency in improving speech quality over most phoneme classes and noise types considered in this study.
  • Keywords
    iterative methods; least mean squares methods; maximum likelihood estimation; speech enhancement; Gaussian mixture model-based maximum-likelihood soft decision solution; Itakura-Saito; TIMIT speech corpus; auto-LSP parameters; baseline enhancement algorithms; broad level phoneme classes; constrained iterative speech enhancement; distinct acoustic properties; hard decision errors; hard decision solution; log-minimum mean squared error; multiple enhanced utterances; noise types; noisy utterance; perceptual evaluation; phonetic classes; segmental signal-to-noise ratio; speech presence uncertainty; speech quality metrics; Correlation; Hidden Markov models; Noise measurement; Signal to noise ratio; Speech; Speech enhancement; Auditory masked threshold; Auto-LSP; constrained iterative speech enhancement;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1558-7916
  • Type

    jour

  • DOI
    10.1109/TASL.2012.2191282
  • Filename
    6182579