• DocumentCode
    1184941
  • Title

    Speech-Signal-Based Frequency Warping

  • Author

    Paliwal, Kuldip ; Shannon, Benjamin ; Lyons, James ; Wójcicki, Kamil

  • Author_Institution
    Signal Process. Lab., Griffith Univ., Nathan, QLD
  • Volume
    16
  • Issue
    4
  • fYear
    2009
  • fDate
    4/1/2009 12:00:00 AM
  • Firstpage
    319
  • Lastpage
    322
  • Abstract
    The speech signal is used for transmission of linguistic information. High energy portions of the speech spectrum have higher signal-to-noise ratios than the low energy portions. As a result, these regions are more robust to noise. Since the speech signal is known to be very robust to noise, it is expected that the high energy regions of the speech spectrum carry the majority of the linguistic information. This letter tries to derive a frequency warping function directly from the speech signal by sampling the frequency axis nonuniformly with the high energy regions sampled more densely than the low energy regions. To achieve this, an ensemble average short-time power spectrum is computed from a large speech corpus. The speech-signal-based frequency warping is obtained by considering equal area portions of the log spectrum. The proposed frequency warping is shown to be similar to the frequency scales obtained through psycho-acoustic experiments, namely the mel and bark scales. The warping is then used in filterbank design for automatic speech recognition experiments. The results of these experiments show that cepstral features based on the proposed warping achieve performance under clean conditions comparable to that of mel-frequency cepstral coefficients, while outperforming them under noisy conditions.
  • Keywords
    filtering theory; speech recognition; automatic speech recognition; average short-time power spectrum; filter bank design; linguistic information transmission; log spectrum; mel-frequency cepstral coefficients; psycho-acoustic; signal-to-noise ratios; speech spectrum; speech-signal-based frequency warping; Auditory system; Automatic speech recognition; Cepstral analysis; Frequency; Noise robustness; Production systems; Psychology; Signal to noise ratio; Speech enhancement; Speech processing; Bark scale; mel scale; robust automatic speech recognition (ASR); speech-signal-based frequency cepstral coefficient (SFCC); speech-signal-based frequency warping;
  • fLanguage
    English
  • Journal_Title
    Signal Processing Letters, IEEE
  • Publisher
    ieee
  • ISSN
    1070-9908
  • Type

    jour

  • DOI
    10.1109/LSP.2009.2014096
  • Filename
    4797893