• DocumentCode
    694549
  • Title

    Onset detection algorithm in voice activity detection for Mandarin

  • Author

    Huan Wang ; Lei Wang

  • Author_Institution
    Sch. of Inf. & Commun. Eng., Beijing Univ. of Posts & Telecommun., Beijing, China
  • fYear
    2013
  • fDate
    12-13 Oct. 2013
  • Firstpage
    1148
  • Lastpage
    1151
  • Abstract
    Voice activity detection (VAD) is one of the most challenging problems in the field of speech signal processing. The statistical model based VADs have been widely studied in the recent literatures, which usually utilize hangover algorithms to prevent clipping of weak speech tails. However, little attention has been paid on the initial consonants, and non-negligible onset detection errors might be incurred especially when the SNR is low. Since most of the Mandarin syllables start with initial consonants, an onset detection algorithm is proposed in this paper to improve the performance of VAD for Mandarin. Although consonants are mostly noise-like, they produce spectral energy distributed more towards the higher frequencies. To this characteristic, the proposed algorithm makes decision whether the weak-start detection could possibly been dampened by noise based on the posterior SNR of high frequency band, and then it makes correction correspondingly after estimating whether the week-start speech frames mistaken for nonspeech frames exist. It shows that the proposed algorithm achieves a considerable performance improvement. Furthermore, this algorithm is independent of noise type.
  • Keywords
    maximum likelihood estimation; natural language processing; speech recognition; Mandarin syllables; hangover algorithms; high-frequency band; initial consonants; noise type; noise-like consonants; nonnegligible onset detection errors; nonspeech frames; onset detection algorithm; performance improvement; posterior SNR; spectral energy distribution; speech signal processing; statistical model-based VAD; voice activity detection; weak-speech tail clipping prevention; weak-start detection; week-start speech frames; Detection algorithms; Hidden Markov models; Signal processing algorithms; Signal to noise ratio; Speech; Speech processing; likelihood ratio test; onset detection; voice activity detection;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Science and Network Technology (ICCSNT), 2013 3rd International Conference on
  • Conference_Location
    Dalian
  • Type

    conf

  • DOI
    10.1109/ICCSNT.2013.6967305
  • Filename
    6967305