• DocumentCode
    2301717
  • Title

    Reliable voice activity detection algorithms under adverse environments

  • Author

    Stadtschnitzer, Michael ; Van Pham, Tuan ; Tan Chien, Tang

  • Author_Institution
    Signal Process. & Speech Commun. Lab., Graz Univ. of Technol., Graz
  • fYear
    2008
  • fDate
    4-6 June 2008
  • Firstpage
    218
  • Lastpage
    223
  • Abstract
    In this paper, two robust voice activity detection (VAD) algorithms are proposed for harsh environments. The first algorithm is based on supervised neural network (NN) using the Levenberg-Marquardt algorithm. A feedforward NN with two layers operates on input features which are the mel-frequency cepstral coefficients extracted from noisy speech frames. The second algorithm is a threshold-based method that employs only single subband power distance feature calculated from wavelet coefficients at different wavelet subbands. A statistical percentile filtering technique based on long-term information is improved to estimate adaptive noise threshold more accurately. The proposed algorithms are tested with the TIMIT database which was artificially distorted by different additive noise types, and are compared with state-of-the-art VAD methods. The results show that they are very robust to different types of noise and mostly outperform the standard VADs such as the ETSI AFE ES 202 050 and ITU-T G.729 B.
  • Keywords
    feature extraction; feedforward neural nets; filtering theory; iterative methods; signal detection; speech recognition; statistical analysis; wavelet transforms; TIMIT database; adaptive noise threshold estimation; additive noise; adverse environment; feedforward neural network; iterative Levenberg-Marquardt algorithm; mel-frequency cepstral coefficient; noisy speech frame; robust voice activity detection algorithm; statistical percentile filtering; supervised neural network; threshold-based method; wavelet coefficient; Adaptive filters; Cepstral analysis; Data mining; Detection algorithms; Information filtering; Neural networks; Robustness; Speech; Wavelet coefficients; Working environment noise; MFCC; neural network; statistical percentile filtration; time-scale feature; voice activity detection;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Communications and Electronics, 2008. ICCE 2008. Second International Conference on
  • Conference_Location
    Hoi an
  • Print_ISBN
    978-1-4244-2425-2
  • Electronic_ISBN
    978-1-4244-2426-9
  • Type

    conf

  • DOI
    10.1109/CCE.2008.4578961
  • Filename
    4578961