• DocumentCode
    1437162
  • Title

    Dialect Classification via Text-Independent Training and Testing for Arabic, Spanish, and Chinese

  • Author

    Lei, Yun ; Hansen, John H L

  • Author_Institution
    Center for Robust Speech Syst. (CRSS), Univ. of Texas at Dallas, Richardson, TX, USA
  • Volume
    19
  • Issue
    1
  • fYear
    2011
  • Firstpage
    85
  • Lastpage
    96
  • Abstract
    Automatic dialect classification has emerged as an important area in the speech research field. Effective dialect classification is useful in developing robust speech systems, such as speech recognition and speaker identification. In this paper, two novel algorithms are proposed to improve dialect classification for text-independent spontaneous speech in Arabic and Spanish languages, along with probe results for Chinese. The problem considers the case where no transcripts but dialect labels are available for training and test data, and speakers are speaking spontaneously, which is defined as text-independent dialect classification. The Gaussian mixture model (GMM) is used as the baseline system for text-independent dialect classification. The major motivation is to suppress confused/distractive regions from the dialect language space and emphasize discriminative/sensitive information of the available dialects. In the training phase, a symmetric version of the Kullback-Leibler divergence is used to find the most discriminative GMM mixtures (KLD-GMM), where the confused acoustic GMM region is suppressed. For testing, the more discriminative frames are detected and used via the location of where the frames are in the GMM mixture feature space, which is termed frame selection decoding (FSD-GMM). The first KLD-GMM and second FSD-GMM techniques, are shown to improve dialect classification performance for three-way dialect tasks. The two algorithms and their combination are evaluated on dialects of Arabic and Spanish corpora. Measurable improvement is achieved in both two cases, over a generalized maximum-likelihood estimation GMM baseline (MLE-GMM).
  • Keywords
    Gaussian processes; learning (artificial intelligence); maximum likelihood estimation; natural language processing; speech recognition; Gaussian mixture model; Kullback-Leibler divergence; automatic dialect classification; frame selection decoding; maximum-likelihood estimation; speaker identification; speech recognition; text-independent spontaneous speech; text-independent training; Acoustic signal detection; Acoustic testing; Automatic speech recognition; Loudspeakers; Maximum likelihood decoding; Maximum likelihood estimation; Natural languages; Probes; Robustness; Speech recognition; Arabic dialects; Gaussian mixture; Kullback–Leibler divergence; Spanish dialects; dialect classification; frame selection;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1558-7916
  • Type

    jour

  • DOI
    10.1109/TASL.2010.2045184
  • Filename
    5428854