DocumentCode
232246
Title
Evaluating vad for automatic speech recognition
Author
Sibo Tong ; Nanxin Chen ; Yanmin Qian ; Kai Yu
Author_Institution
Dept. of Comput. Sci. & Eng., Shanghai Jiao Tong Univ., Shanghai, China
fYear
2014
fDate
19-23 Oct. 2014
Firstpage
2308
Lastpage
2314
Abstract
Voice activity detection (VAD) plays a crucial role in speech processing, especially in automatic speech recognition (ASR). It identifies the boundaries of the speech to be recognized and the boundary accuracies may significantly affect the recognition performance. Conventional VAD evaluation criteria are mostly based on frame-level accuracy of speech/non-speech classification, which may result in weak correlation between VAD and ASR performance. Even though some VAD evaluation criteria consider boundary effects, there has not been an effective overall criterion suitable for evaluating the effect of VAD on ASR. This paper proposes an integrated VAD evaluation criterion taking into account various boundary effects. Experiments on an English Switchboard task showed that, conventional frame accuracy based VAD criterion has weak and unstable correlation with word error rate while the proposed overall criterion is much more stably correlated to word error rate.
Keywords
signal classification; speech recognition; ASR; English switchboard task; VAD; automatic speech recognition; boundary effects; speech-nonspeech classification; voice activity detection; Accuracy; Automatic speech recognition; Correlation; Error analysis; Measurement; Speech; evaluation metric; speech recognition; voice activity detection;
fLanguage
English
Publisher
ieee
Conference_Titel
Signal Processing (ICSP), 2014 12th International Conference on
Conference_Location
Hangzhou
ISSN
2164-5221
Print_ISBN
978-1-4799-2188-1
Type
conf
DOI
10.1109/ICOSP.2014.7015406
Filename
7015406
Link To Document