Discriminatively Trained GMMs for Language Classification Using Boosting Methods

Author

Siu, Man-Hung ; Yang, Xi ; Gish, Herbert

Author_Institution

Speech & Language Process. Dept., BBN Technol., Cambridge, MA

Volume

17

Issue

1

fYear

2009

Firstpage

187

Lastpage

197

Abstract

In language identification and other speech applications, discriminatively trained models often outperform nondiscriminative models trained with the maximum-likelihood criterion. For instance, discriminative Gaussian mixture models (GMMs) are typically trained by optimizing some discriminative criteria that can be computationally expensive and complex to implement. In this paper, we explore a novel approach to discriminative GMM training by using a variant the boosting framework (R. Schapire, ldquoThe boosting approach to machine learning, an overview,rdquo Proc. MSRI Workshop on Nonlinear Estimation and Classification, 2002) from machine learning, in which an ensemble of GMMs is trained sequentially. We have extended the purview of boosting to class conditional models (as opposed to discriminative models such as classification trees). The effectiveness of our boosting variation comes from the emphasis on working with the misclassified data to achieve discriminatively trained models. Our variant of boosting also includes utilizing low confidence data classifications as well as misclassified examples in classifier generation. We further apply our boosting approach to anti-models to achieve additional performance gains. We have applied our discriminative training approach to a variety of language identification experiments using the 12-language NIST 2003 language identification task. We show the significant performance improvements that can be obtained. The experiments include both acoustic as well as token-based speech models. Our best performing boosted GMM-based system on the 12-language verification task has a 2.3% EER.

Keywords

Gaussian processes; maximum likelihood estimation; natural language processing; pattern classification; 12-language NIST 2003 language identification task; boosting methods; discriminative Gaussian mixture models; discriminatively trained GMM; language classification; language identification; low confidence data classifications; maximum-likelihood criterion; Boosting; Classification tree analysis; Logistics; Machine learning; Maximum likelihood estimation; NIST; Natural languages; Parameter estimation; Performance gain; Speech processing; Boosting; discriminative training; language identification;

fLanguage

English

Journal_Title

Audio, Speech, and Language Processing, IEEE Transactions on

Publisher

ieee

ISSN

1558-7916

Type

jour

DOI

10.1109/TASL.2008.2006653

Filename

4740154