• DocumentCode
    788286
  • Title

    Performance Estimation of Speech Recognition System Under Noise Conditions Using Objective Quality Measures and Artificial Voice

  • Author

    Yamada, Takeshi ; Kumakura, Masakazu ; Kitawaki, Nobuhiko

  • Author_Institution
    Graduate Sch. of Syst. & Inf. Eng., Tsukuba Univ.
  • Volume
    14
  • Issue
    6
  • fYear
    2006
  • Firstpage
    2006
  • Lastpage
    2013
  • Abstract
    It is essential to ensure quality of service (QoS) when offering a speech recognition service for use in noisy environments. This means that the recognition performance in the target noise environment must be investigated. One approach is to estimate the recognition performance from a distortion value, which represents the difference between noisy speech and its original clean version. Previously, estimation methods using the segmental signal-to-noise ratio (SNRseg), the cepstral distance (CD), and the perceptual evaluation of speech quality (PESQ) have been proposed. However, their estimation accuracy has not been verified for the case when a noise reduction algorithm is adopted as a preprocessing stage in speech recognition. We, therefore, evaluated the effectiveness of these distortion measures by experiments using the AURORA-2J connected digit recognition task and four different noise reduction algorithms. The results showed that in each case the distortion measure correlates well with the word accuracy when the estimators used are optimized for each individual noise reduction algorithm. In addition, it was confirmed that when a single estimator, optimized for all the noise reduction algorithms, is used, the PESQ method gives a more accurate estimate than SNRseg and CD. Furthermore, we have proposed the use of artificial voice of several seconds duration instead of a large amount of real speech and confirmed that a relatively accurate estimate can be obtained by using the artificial voice
  • Keywords
    acoustic noise; quality of service; speech processing; speech recognition; QoS; artificial voice; cepstral distance; noisy speech; objective quality measures; perceptual evaluation of speech quality; quality of service; segmental signal-to-noise ratio; speech preprocessing stage; speech recognition system; Cepstral analysis; Distortion measurement; Noise measurement; Noise reduction; Quality of service; Signal to noise ratio; Speech analysis; Speech recognition; Target recognition; Working environment noise; Artificial voice; noise reduction; objective quality measures; performance estimation; speech recognition;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1558-7916
  • Type

    jour

  • DOI
    10.1109/TASL.2006.883254
  • Filename
    1709890