DocumentCode
1437162
Title
Dialect Classification via Text-Independent Training and Testing for Arabic, Spanish, and Chinese
Author
Lei, Yun ; Hansen, John H L
Author_Institution
Center for Robust Speech Syst. (CRSS), Univ. of Texas at Dallas, Richardson, TX, USA
Volume
19
Issue
1
fYear
2011
Firstpage
85
Lastpage
96
Abstract
Automatic dialect classification has emerged as an important area in the speech research field. Effective dialect classification is useful in developing robust speech systems, such as speech recognition and speaker identification. In this paper, two novel algorithms are proposed to improve dialect classification for text-independent spontaneous speech in Arabic and Spanish languages, along with probe results for Chinese. The problem considers the case where no transcripts but dialect labels are available for training and test data, and speakers are speaking spontaneously, which is defined as text-independent dialect classification. The Gaussian mixture model (GMM) is used as the baseline system for text-independent dialect classification. The major motivation is to suppress confused/distractive regions from the dialect language space and emphasize discriminative/sensitive information of the available dialects. In the training phase, a symmetric version of the Kullback-Leibler divergence is used to find the most discriminative GMM mixtures (KLD-GMM), where the confused acoustic GMM region is suppressed. For testing, the more discriminative frames are detected and used via the location of where the frames are in the GMM mixture feature space, which is termed frame selection decoding (FSD-GMM). The first KLD-GMM and second FSD-GMM techniques, are shown to improve dialect classification performance for three-way dialect tasks. The two algorithms and their combination are evaluated on dialects of Arabic and Spanish corpora. Measurable improvement is achieved in both two cases, over a generalized maximum-likelihood estimation GMM baseline (MLE-GMM).
Keywords
Gaussian processes; learning (artificial intelligence); maximum likelihood estimation; natural language processing; speech recognition; Gaussian mixture model; Kullback-Leibler divergence; automatic dialect classification; frame selection decoding; maximum-likelihood estimation; speaker identification; speech recognition; text-independent spontaneous speech; text-independent training; Acoustic signal detection; Acoustic testing; Automatic speech recognition; Loudspeakers; Maximum likelihood decoding; Maximum likelihood estimation; Natural languages; Probes; Robustness; Speech recognition; Arabic dialects; Gaussian mixture; Kullback–Leibler divergence; Spanish dialects; dialect classification; frame selection;
fLanguage
English
Journal_Title
Audio, Speech, and Language Processing, IEEE Transactions on
Publisher
ieee
ISSN
1558-7916
Type
jour
DOI
10.1109/TASL.2010.2045184
Filename
5428854
Link To Document