Performance Estimation of Speech Recognition System Under Noise Conditions Using Objective Quality Measures and Artificial Voice

Author

Yamada, Takeshi ; Kumakura, Masakazu ; Kitawaki, Nobuhiko

Author_Institution

Graduate Sch. of Syst. & Inf. Eng., Tsukuba Univ.

Volume

14

Issue

6

fYear

2006

Firstpage

2006

Lastpage

2013

Abstract

It is essential to ensure quality of service (QoS) when offering a speech recognition service for use in noisy environments. This means that the recognition performance in the target noise environment must be investigated. One approach is to estimate the recognition performance from a distortion value, which represents the difference between noisy speech and its original clean version. Previously, estimation methods using the segmental signal-to-noise ratio (SNRseg), the cepstral distance (CD), and the perceptual evaluation of speech quality (PESQ) have been proposed. However, their estimation accuracy has not been verified for the case when a noise reduction algorithm is adopted as a preprocessing stage in speech recognition. We, therefore, evaluated the effectiveness of these distortion measures by experiments using the AURORA-2J connected digit recognition task and four different noise reduction algorithms. The results showed that in each case the distortion measure correlates well with the word accuracy when the estimators used are optimized for each individual noise reduction algorithm. In addition, it was confirmed that when a single estimator, optimized for all the noise reduction algorithms, is used, the PESQ method gives a more accurate estimate than SNRseg and CD. Furthermore, we have proposed the use of artificial voice of several seconds duration instead of a large amount of real speech and confirmed that a relatively accurate estimate can be obtained by using the artificial voice

Keywords

acoustic noise; quality of service; speech processing; speech recognition; QoS; artificial voice; cepstral distance; noisy speech; objective quality measures; perceptual evaluation of speech quality; quality of service; segmental signal-to-noise ratio; speech preprocessing stage; speech recognition system; Cepstral analysis; Distortion measurement; Noise measurement; Noise reduction; Quality of service; Signal to noise ratio; Speech analysis; Speech recognition; Target recognition; Working environment noise; Artificial voice; noise reduction; objective quality measures; performance estimation; speech recognition;

fLanguage

English

Journal_Title

Audio, Speech, and Language Processing, IEEE Transactions on

Publisher

ieee

ISSN

1558-7916

Type

jour

DOI

10.1109/TASL.2006.883254

Filename

1709890