• DocumentCode
    1493273
  • Title

    Data Balancing for Efficient Training of Hybrid ANN/HMM Automatic Speech Recognition Systems

  • Author

    García-Moral, Ana Isabel ; Solera-Ureña, Rubén ; Peláez-Moreno, Carmen ; Díaz-de-María, Fernando

  • Author_Institution
    Signal Process. & Commun. Dept., Univ. Carlos III of Madrid, Leganés, Spain
  • Volume
    19
  • Issue
    3
  • fYear
    2011
  • fDate
    3/1/2011 12:00:00 AM
  • Firstpage
    468
  • Lastpage
    481
  • Abstract
    Hybrid speech recognizers, where the estimation of the emission pdf of the states of hidden Markov models (HMMs), usually carried out using Gaussian mixture models (GMMs), is substituted by artificial neural networks (ANNs) have several advantages over the classical systems. However, to obtain performance improvements, the computational requirements are heavily increased because of the need to train the ANN. Departing from the observation of the remarkable skewness of speech data, this paper proposes sifting out the training set and balancing the amount of samples per class. With this method, the training time has been reduced 18 times while obtaining performances similar to or even better than those with the whole database, especially in noisy environments. However, the application of these reduced sets is not straightforward. To avoid the mismatch between training and testing conditions created by the modification of the distribution of the training data, a proper scaling of the a posteriori probabilities obtained and a resizing of the context window need to be performed as demonstrated in this paper.
  • Keywords
    hidden Markov models; learning (artificial intelligence); speech recognition; Gaussian mixture models; a posteriori probability; artificial neural networks; data balancing; hidden Markov models; hybrid ANN/HMM automatic speech recognition; training set; Artificial neural networks; Automatic speech recognition; Databases; Hidden Markov models; Noise reduction; Speech recognition; State estimation; Testing; Training data; Working environment noise; ANN/HMM; Active learning; MLP/HMM; additive noise; artificial neural networks (ANNs); hidden Markov models (HMMs); hybrid automatic speech recognition (ASR); machine learning; multilayer perceptrons (MLPs); robust ASR;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1558-7916
  • Type

    jour

  • DOI
    10.1109/TASL.2010.2050513
  • Filename
    5466113