• DocumentCode
    1690472
  • Title

    Modeling heterogeneous data sources for speech recognition using synchronous hidden Markov models

  • Author

    Yong Zhao ; Biing-Hwang Juang

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Georgia Inst. of Technol., Atlanta, GA, USA
  • fYear
    2013
  • Firstpage
    7403
  • Lastpage
    7407
  • Abstract
    In this paper, we propose a novel acoustic modeling framework, synchronous HMM, which takes full advantage of the capacity of the heterogeneous data sources and achieves an optimal balance between modeling accuracy and robustness. The synchronous HMM introduces an additional layer of substates between the HMM states and the Gaussian component variables. The substates have the capability to register long-span non-phonetic attributes, which are integrally called speech scenes in this study. The hierarchical modeling scheme allows an accurate description of probability distribution of speech units in different speech scenes. To address the data sparsity problem, a decision-based clustering algorithm is presented to determine the set of speech scenes and to tie the substate parameters. Moreover, we propose the multiplex Viterbi algorithm to efficiently decode the synchronous HMMs within a search space of the same size as for the standard HMMs. The experiments on the Aurora 2 task show that the synchronous HMMs produce a significant improvement in recognition performance over the HMM baseline at the expense of a moderate increase in the memory requirement and computational complexity.
  • Keywords
    Gaussian distribution; decoding; hidden Markov models; maximum likelihood estimation; pattern clustering; speech recognition; Aurora 2 task show; Gaussian component variables; HMM baseline; acoustic modeling framework; computational complexity; decision-based clustering algorithm; heterogeneous data sources; long-span nonphonetic attributes; memory requirement; multiplex Viterbi algorithm; probability distribution; speech recognition; speech scenes; synchronous HMM; synchronous hidden Markov models; Computational modeling; Decision trees; Decoding; Hidden Markov models; Multiplexing; Speech; Viterbi algorithm; Speech recognition; Viterbi algorithm; hidden Markov model; system combination;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on
  • Conference_Location
    Vancouver, BC
  • ISSN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2013.6639101
  • Filename
    6639101