Title :
Performance Estimation of Speech Recognition System Under Noise Conditions Using Objective Quality Measures and Artificial Voice
Author :
Yamada, Takeshi ; Kumakura, Masakazu ; Kitawaki, Nobuhiko
Author_Institution :
Graduate Sch. of Syst. & Inf. Eng., Tsukuba Univ.
Abstract :
It is essential to ensure quality of service (QoS) when offering a speech recognition service for use in noisy environments. This means that the recognition performance in the target noise environment must be investigated. One approach is to estimate the recognition performance from a distortion value, which represents the difference between noisy speech and its original clean version. Previously, estimation methods using the segmental signal-to-noise ratio (SNRseg), the cepstral distance (CD), and the perceptual evaluation of speech quality (PESQ) have been proposed. However, their estimation accuracy has not been verified for the case when a noise reduction algorithm is adopted as a preprocessing stage in speech recognition. We, therefore, evaluated the effectiveness of these distortion measures by experiments using the AURORA-2J connected digit recognition task and four different noise reduction algorithms. The results showed that in each case the distortion measure correlates well with the word accuracy when the estimators used are optimized for each individual noise reduction algorithm. In addition, it was confirmed that when a single estimator, optimized for all the noise reduction algorithms, is used, the PESQ method gives a more accurate estimate than SNRseg and CD. Furthermore, we have proposed the use of artificial voice of several seconds duration instead of a large amount of real speech and confirmed that a relatively accurate estimate can be obtained by using the artificial voice
Keywords :
acoustic noise; quality of service; speech processing; speech recognition; QoS; artificial voice; cepstral distance; noisy speech; objective quality measures; perceptual evaluation of speech quality; quality of service; segmental signal-to-noise ratio; speech preprocessing stage; speech recognition system; Cepstral analysis; Distortion measurement; Noise measurement; Noise reduction; Quality of service; Signal to noise ratio; Speech analysis; Speech recognition; Target recognition; Working environment noise; Artificial voice; noise reduction; objective quality measures; performance estimation; speech recognition;
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
DOI :
10.1109/TASL.2006.883254