• DocumentCode
    302299
  • Title

    Towards robustness to fast speech in ASR

  • Author

    Mirghafori, Nikki ; Fosler, Eric ; Morgan, Nelson

  • Author_Institution
    Dept. of Electr. Eng. & Comput. Sci., California Univ., Berkeley, CA, USA
  • Volume
    1
  • fYear
    1996
  • fDate
    7-10 May 1996
  • Firstpage
    335
  • Abstract
    Psychoacoustic studies show that human listeners are sensitive to speaking rate variations. Automatic speech recognition (ASR) systems are even more affected by the changes in rate, as double to quadruple word recognition error rates of average speakers have been observed for fast speakers on many ASR systems. In our earlier work (see Proceedings of EUROSPEECH95, p.491-4, 1995), we studied the causes of higher error and concluded that both the acoustic-phonetic and the phonological differences are sources of higher word error rates. In this work, we have studied various measures for quantifying rate of speech (ROS) and used simple methods for estimating the speaking rate of a novel utterance using ASR technology. We have also implemented mechanisms that make our ASR system more robust to fast speech. Using our ROS estimator to identify fast sentences in the test set, our rate-dependent system has 24.5% fewer errors on the fastest sentences and 6.2% fewer errors on all sentences of the WSJ93 evaluation set relative to the baseline HMM/MLP system
  • Keywords
    acoustic signal processing; hidden Markov models; speech processing; speech recognition; ASR systems; WSJ93 evaluation set; acoustic-phonetic differences; automatic speech recognition; baseline HMM/MLP system; fast speakers; fast speech analysis; human listeners; phonological differences; psychoacoustic studies; rate of speech; rate-dependent system; sentences; speaking rate variations; word recognition error rates; Acoustic measurements; Automatic speech recognition; Computer science; Error analysis; Hidden Markov models; Humans; Loudspeakers; Psychology; Robustness; Speech recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech, and Signal Processing, 1996. ICASSP-96. Conference Proceedings., 1996 IEEE International Conference on
  • Conference_Location
    Atlanta, GA
  • ISSN
    1520-6149
  • Print_ISBN
    0-7803-3192-3
  • Type

    conf

  • DOI
    10.1109/ICASSP.1996.541100
  • Filename
    541100