• DocumentCode
    3484178
  • Title

    Speech adaptation using neural networks for connected digit recognition

  • Author

    Cheng, Xuelin ; Wang, Han ; Li, Zongge

  • Author_Institution
    Dept. of Comput. Sci., Fudan Univ., Shanghai, China
  • Volume
    5
  • fYear
    2002
  • fDate
    18-22 Nov. 2002
  • Firstpage
    2401
  • Abstract
    The performance of speech recognizers is usually degraded when used in different environments due to varied channels, speech rate and so on. Retraining the recognizers demands a large amount of new data recorded under new environments. On the contrary adaptation can fit the characteristics of the new environments by using only a small amount of data. In this paper a neural network based adaptation was applied to enhance the performance of connected digit recognition system because of its ability of computing the nonlinear function. A baseline system was built on OGI Number corpus, which had 97.76% word accuracy and 85.19% sentence accuracy on itself. However when tested on the Australia English Telephone Speech Database the performance greatly decreased to 71.86% and 16.67% respectively. To avoid retraining the recognizer a feed-forward backpropagation network was used to fit the characteristics of new data, and reduced the error rate by 53% when combined with maximum likelihood linear regression.
  • Keywords
    backpropagation; feedforward neural nets; hidden Markov models; maximum likelihood estimation; regression analysis; speech recognition; Australia English Telephone Speech Database; OGI Number corpus; baseline system; connected digit recognition; context-dependent units; expectation-maximization technique; feedforward backpropagation network; hidden Markov model speech recognizers; maximum likelihood linear regression; neural network based adaptation; nonlinear function; sentence accuracy; speech adaptation; speech recognizers performance; word accuracy; Australia; Character recognition; Computer networks; Databases; Degradation; Feedforward systems; Neural networks; Speech recognition; Telephony; Testing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Neural Information Processing, 2002. ICONIP '02. Proceedings of the 9th International Conference on
  • Print_ISBN
    981-04-7524-1
  • Type

    conf

  • DOI
    10.1109/ICONIP.2002.1201924
  • Filename
    1201924