• DocumentCode
    178710
  • Title

    Black box optimization for automatic speech recognition

  • Author

    Watanabe, Shigetaka ; Le Roux, Jonathan

  • Author_Institution
    Mitsubishi Electr. Res. Labs. (MERL), Cambridge, MA, USA
  • fYear
    2014
  • fDate
    4-9 May 2014
  • Firstpage
    3256
  • Lastpage
    3260
  • Abstract
    State-of-the-art automatic speech recognition (ASR) systems are very complex, combining multiple techniques and involving many types of tuning parameters (e.g., numbers of states and Gaussians in HMMs, numbers of neurons/layers and learning rates in neural networks, etc.). To reach optimal performance in such systems, deep understanding and expertise of each component is necessary, thus limiting the development of ASR systems to skilled experts. To overcome the problem, this paper studies the use of black box optimization, which automatically tunes systems without any prior knowledge. We consider an ASR system as a function with tuning parameters as input and speech recognition performance (e.g., word accuracy) as output, and we investigate two probabilistic black box optimization techniques: Covariance Mean Adaptation Evolution Strategy (CMA-ES) and Bayesian optimization using Gaussian process. Middle-vocabulary speech recognition experiments show the effectiveness of black box optimization, as performance approaching that of fine-tuned systems obtained by experts and/or outperforming that of sub-optimal systems can be automatically obtained.
  • Keywords
    Bayes methods; Gaussian processes; covariance analysis; optimisation; probability; speech recognition; vocabulary; ASR system; Bayesian optimization; CMA-ES; Gaussian process; HMM; automatic speech recognition system; covariance mean adaptation evolution strategy; middle-vocabulary speech recognition experiment; neural network; probabilistic black box optimization technique; Bayes methods; Hidden Markov models; Optimization; Probabilistic logic; Speech recognition; Training; Tuning; Bayesian optimization; Black box optimization; CMA-ES; Gaussian process; Speech recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
  • Conference_Location
    Florence
  • Type

    conf

  • DOI
    10.1109/ICASSP.2014.6854202
  • Filename
    6854202