Maximum Conditional Mutual Information Weighted Scoring for Speech Recognition

Author

Omar, Mohamed Kamal ; Ramaswamy, Ganesh N.

Author_Institution

IBM Thomas J. Watson Res. Center, Yorktown Heights, NY

Volume

1

fYear

2006

fDate

14-19 May 2006

Abstract

This paper describes a novel approach for extending the prototype Gaussian mixture model used in representing different classes in many recognition or classification systems and its application to large vocabulary automatic speech recognition (ASR). This is achieved by estimating weighting vectors to the log likelihood values due to different elements in the feature vector. This approach estimates the weighting vectors which maximize an estimate of the conditional mutual information between the log likelihood score and a binary random variable representing whether the log likelihood is estimated using the model of the correct label or not. It is shown in the paper that under some assumptions on the conditional probability density function (PDF) of the log likelihood score given this random variable, maximizing the differential entropy of a normalized log likelihood score is an equivalent criterion. This approach allows emphasizing different features, in the acoustic feature vector used in the system, for different hidden Markov model (HMM) states. In this paper, we apply this approach to the RT04 Arabic broadcast news speech recognition task. Compared to the baseline system, 3% relative improvement in the word error rate (WER) is obtained

Keywords

Gaussian processes; hidden Markov models; probability; speech recognition; Gaussian mixture model; HMM; RT04 Arabic broadcast news; binary random variable; differential entropy; hidden Markov model; mutual information weighted scoring; normalized log likelihood score; probability density function; vocabulary automatic speech recognition; word error rate; Automatic speech recognition; Broadcasting; Entropy; Hidden Markov models; Mutual information; Probability density function; Prototypes; Random variables; Speech recognition; Vocabulary;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on

Conference_Location

Toulouse

ISSN

1520-6149

Print_ISBN

1-4244-0469-X

Type

conf

DOI

10.1109/ICASSP.2006.1660011

Filename

1660011