• DocumentCode
    1686141
  • Title

    Voice activity detection based on frequency modulation of harmonics

  • Author

    Chung-Chien Hsu ; Tse-En Lin ; Jian-Hueng Chen ; Tai-Shih Chi

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Nat. Chiao Tung Univ., Hsinchu, Taiwan
  • fYear
    2013
  • Firstpage
    6679
  • Lastpage
    6683
  • Abstract
    In this paper, we propose a voice activity detection (VAD) algorithm based on spectro-temporal modulation structures of input sounds. A multi-resolution spectro-temporal analysis framework is used to inspect prominent speech structures. By comparing with an adaptive threshold, the proposed VAD distinguishes speech from non-speech based on the energy of the frequency modulation of harmonics. Compared with three standard VADs, ITU-T G.729B, ETSI AMR1 and AMR2, our proposed VAD significantly outperforms them in non-stationary noises in terms of the receiver operating characteristic (ROC) curves and the recognition rates from a practical distributed speech recognition (DSR) system.
  • Keywords
    frequency modulation; sensitivity analysis; speech recognition; DSR system; ETSI AMR1; ETSI AMR2; ITU-T G.729B; ROC curves; VAD algorithm; adaptive threshold; distributed speech recognition system; harmonics frequency modulation; input sounds; multiresolution spectro-temporal analysis framework; nonstationary noises; receiver operating characteristic curves; recognition rates; spectro-temporal modulation structures; speech structures; voice activity detection algorithm; Frequency modulation; Harmonic analysis; Noise; Spectrogram; Speech; Speech recognition; frequency modulation; spectro-temporal analysis; voice activity detection;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on
  • Conference_Location
    Vancouver, BC
  • ISSN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2013.6638954
  • Filename
    6638954