Dialect Classification via Text-Independent Training and Testing for Arabic, Spanish, and Chinese

Author

Lei, Yun ; Hansen, John H L

Author_Institution

Center for Robust Speech Syst. (CRSS), Univ. of Texas at Dallas, Richardson, TX, USA

Volume

19

Issue

1

fYear

2011

Firstpage

85

Lastpage

96

Abstract

Automatic dialect classification has emerged as an important area in the speech research field. Effective dialect classification is useful in developing robust speech systems, such as speech recognition and speaker identification. In this paper, two novel algorithms are proposed to improve dialect classification for text-independent spontaneous speech in Arabic and Spanish languages, along with probe results for Chinese. The problem considers the case where no transcripts but dialect labels are available for training and test data, and speakers are speaking spontaneously, which is defined as text-independent dialect classification. The Gaussian mixture model (GMM) is used as the baseline system for text-independent dialect classification. The major motivation is to suppress confused/distractive regions from the dialect language space and emphasize discriminative/sensitive information of the available dialects. In the training phase, a symmetric version of the Kullback-Leibler divergence is used to find the most discriminative GMM mixtures (KLD-GMM), where the confused acoustic GMM region is suppressed. For testing, the more discriminative frames are detected and used via the location of where the frames are in the GMM mixture feature space, which is termed frame selection decoding (FSD-GMM). The first KLD-GMM and second FSD-GMM techniques, are shown to improve dialect classification performance for three-way dialect tasks. The two algorithms and their combination are evaluated on dialects of Arabic and Spanish corpora. Measurable improvement is achieved in both two cases, over a generalized maximum-likelihood estimation GMM baseline (MLE-GMM).

Keywords

Gaussian processes; learning (artificial intelligence); maximum likelihood estimation; natural language processing; speech recognition; Gaussian mixture model; Kullback-Leibler divergence; automatic dialect classification; frame selection decoding; maximum-likelihood estimation; speaker identification; speech recognition; text-independent spontaneous speech; text-independent training; Acoustic signal detection; Acoustic testing; Automatic speech recognition; Loudspeakers; Maximum likelihood decoding; Maximum likelihood estimation; Natural languages; Probes; Robustness; Speech recognition; Arabic dialects; Gaussian mixture; Kullback–Leibler divergence; Spanish dialects; dialect classification; frame selection;

fLanguage

English

Journal_Title

Audio, Speech, and Language Processing, IEEE Transactions on

Publisher

ieee

ISSN

1558-7916

Type

jour

DOI

10.1109/TASL.2010.2045184

Filename

5428854