Modeling heterogeneous data sources for speech recognition using synchronous hidden Markov models

Author

Yong Zhao ; Biing-Hwang Juang

Author_Institution

Dept. of Electr. & Comput. Eng., Georgia Inst. of Technol., Atlanta, GA, USA

fYear

2013

Firstpage

7403

Lastpage

7407

Abstract

In this paper, we propose a novel acoustic modeling framework, synchronous HMM, which takes full advantage of the capacity of the heterogeneous data sources and achieves an optimal balance between modeling accuracy and robustness. The synchronous HMM introduces an additional layer of substates between the HMM states and the Gaussian component variables. The substates have the capability to register long-span non-phonetic attributes, which are integrally called speech scenes in this study. The hierarchical modeling scheme allows an accurate description of probability distribution of speech units in different speech scenes. To address the data sparsity problem, a decision-based clustering algorithm is presented to determine the set of speech scenes and to tie the substate parameters. Moreover, we propose the multiplex Viterbi algorithm to efficiently decode the synchronous HMMs within a search space of the same size as for the standard HMMs. The experiments on the Aurora 2 task show that the synchronous HMMs produce a significant improvement in recognition performance over the HMM baseline at the expense of a moderate increase in the memory requirement and computational complexity.

Keywords

Gaussian distribution; decoding; hidden Markov models; maximum likelihood estimation; pattern clustering; speech recognition; Aurora 2 task show; Gaussian component variables; HMM baseline; acoustic modeling framework; computational complexity; decision-based clustering algorithm; heterogeneous data sources; long-span nonphonetic attributes; memory requirement; multiplex Viterbi algorithm; probability distribution; speech recognition; speech scenes; synchronous HMM; synchronous hidden Markov models; Computational modeling; Decision trees; Decoding; Hidden Markov models; Multiplexing; Speech; Viterbi algorithm; Speech recognition; Viterbi algorithm; hidden Markov model; system combination;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on

Conference_Location

Vancouver, BC

ISSN

1520-6149

Type

conf

DOI

10.1109/ICASSP.2013.6639101

Filename

6639101