Title :
Tree-Based Covariance Modeling of Hidden Markov Models
Author :
Tian, Ye ; Zhou, Jian-lai ; Lin, Hui ; Jiang, Hui
Author_Institution :
Div. of Speech & Natural Language, Microsoft Corp., Redmond, WA
Abstract :
In this paper, we present a tree-based, full covariance hidden Markov modeling technique for automatic speech recognition applications. A multilayered tree is built first to organize all covariance matrices into a hierarchical structure. Kullback-Leibler divergence is used in the tree-building to measure inter-Gaussian distortion and successive splitting is used to construct the multilayer covariance tree. To cope with the data sparseness problem in estimating a full covariance matrix, we interpolate the diagonal covariance matrix of a leaf-node at the bottom of the tree with the full covariance of its parent and ancestors along the path up to the root node. The interpolation coefficients are estimated in the maximum likelihood sense via the EM algorithm. The interpolation is performed in three different parametric forms: 1) inverse covariance matrix, 2) covariance matrix, and 3) off-diagonal terms of the full covariance matrix. The proposed algorithm is tested in three different databases: 1) the DARPA resource management (RM), 2) the switchboard, and 3) a Chinese dictation. In all three databases, we show that the proposed tree-based full covariance modeling consistently performs better than the baseline diagonal covariance modeling. The algorithm outperforms other covariance modeling techniques, including: 1) the semi-tied covariance modeling (STC), 2) heteroscedastic linear discriminant analysis (HLDA), 3) mixtures of inverse covariance (MIC), and 4) direct full covariance modeling
Keywords :
Gaussian processes; covariance matrices; expectation-maximisation algorithm; hidden Markov models; interpolation; speech recognition; trees (mathematics); Chinese dictation; DARPA resource management; Kullback-Leibler divergence; automatic speech recognition applications; covariance matrices; direct full covariance modeling; expectation-maximization algorithm; heteroscedastic linear discriminant analysis; hidden Markov models; inter-Gaussian distortion; interpolation coefficients; maximum likelihood sense; mixtures of inverse covariance; multilayered tree; semi-tied covariance modeling; switchboard; tree-based covariance modeling; Automatic speech recognition; Covariance matrix; Databases; Distortion measurement; Hidden Markov models; Interpolation; Maximum likelihood estimation; Nonhomogeneous media; Resource management; Testing; Automatic speech recognition; Gaussian mixture models; covariance modeling; tree modeling;
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
DOI :
10.1109/TSA.2005.863210