• DocumentCode
    2790650
  • Title

    Language recognition using deep-structured conditional random fields

  • Author

    Yu, Dong ; Wang, Shizhen ; Karam, Zahi ; Deng, Li

  • Author_Institution
    Microsoft Res., Redmond, WA, USA
  • fYear
    2010
  • fDate
    14-19 March 2010
  • Firstpage
    5030
  • Lastpage
    5033
  • Abstract
    We present a novel language identification technique using our recently developed deep-structured conditional random fields (CRFs). The deep-structured CRF is a multi-layer CRF model in which each higher layer´s input observation sequence consists of the lower layer´s observation sequence and the resulting lower layer´s frame-level marginal probabilities. In this paper we extend the original deep-structured CRF by allowing for distinct state representations at different layers and demonstrate its benefits. We propose an unsupervised algorithm to pre-train the intermediate layers by casting it as a multi-objective programming problem that is aimed at minimizing the average frame-level conditional entropy while maximizing the state occupation entropy. Empirical evaluation on a seven-language/dialect voice mail routing task showed that our approach can achieve a routing accuracy (RA) of 86.4% and average equal error rate (EER) of 6.6%. These results are significantly better than the 82.5% RA and 7.5% average EER obtained using the Gaussian mixture model trained with the maximum mutual information criterion but slightly worse than the 87.7% RA and 6.4% EER achieved using the support vector machine with model pushing on the Gaussian super vector (GSV).
  • Keywords
    entropy; random processes; speech recognition; support vector machines; voice mail; Gaussian super vector; deep-structured conditional random fields; equal error rate; frame-level conditional entropy; frame-level marginal probabilities; language identification technique; language recognition; multiobjective programming problem; observation sequence; state occupation entropy; support vector machine; voice mail routing task; Automatic speech recognition; Casting; Entropy; Error analysis; Mutual information; Routing; Support vector machine classification; Support vector machines; Unsupervised learning; Voice mail; conditional random field; deep learning; deep-structure; language identification; unsupervised learning;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on
  • Conference_Location
    Dallas, TX
  • ISSN
    1520-6149
  • Print_ISBN
    978-1-4244-4295-9
  • Electronic_ISBN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2010.5495072
  • Filename
    5495072