• DocumentCode
    232246
  • Title

    Evaluating vad for automatic speech recognition

  • Author

    Sibo Tong ; Nanxin Chen ; Yanmin Qian ; Kai Yu

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Shanghai Jiao Tong Univ., Shanghai, China
  • fYear
    2014
  • fDate
    19-23 Oct. 2014
  • Firstpage
    2308
  • Lastpage
    2314
  • Abstract
    Voice activity detection (VAD) plays a crucial role in speech processing, especially in automatic speech recognition (ASR). It identifies the boundaries of the speech to be recognized and the boundary accuracies may significantly affect the recognition performance. Conventional VAD evaluation criteria are mostly based on frame-level accuracy of speech/non-speech classification, which may result in weak correlation between VAD and ASR performance. Even though some VAD evaluation criteria consider boundary effects, there has not been an effective overall criterion suitable for evaluating the effect of VAD on ASR. This paper proposes an integrated VAD evaluation criterion taking into account various boundary effects. Experiments on an English Switchboard task showed that, conventional frame accuracy based VAD criterion has weak and unstable correlation with word error rate while the proposed overall criterion is much more stably correlated to word error rate.
  • Keywords
    signal classification; speech recognition; ASR; English switchboard task; VAD; automatic speech recognition; boundary effects; speech-nonspeech classification; voice activity detection; Accuracy; Automatic speech recognition; Correlation; Error analysis; Measurement; Speech; evaluation metric; speech recognition; voice activity detection;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Signal Processing (ICSP), 2014 12th International Conference on
  • Conference_Location
    Hangzhou
  • ISSN
    2164-5221
  • Print_ISBN
    978-1-4799-2188-1
  • Type

    conf

  • DOI
    10.1109/ICOSP.2014.7015406
  • Filename
    7015406