DocumentCode
2790650
Title
Language recognition using deep-structured conditional random fields
Author
Yu, Dong ; Wang, Shizhen ; Karam, Zahi ; Deng, Li
Author_Institution
Microsoft Res., Redmond, WA, USA
fYear
2010
fDate
14-19 March 2010
Firstpage
5030
Lastpage
5033
Abstract
We present a novel language identification technique using our recently developed deep-structured conditional random fields (CRFs). The deep-structured CRF is a multi-layer CRF model in which each higher layer´s input observation sequence consists of the lower layer´s observation sequence and the resulting lower layer´s frame-level marginal probabilities. In this paper we extend the original deep-structured CRF by allowing for distinct state representations at different layers and demonstrate its benefits. We propose an unsupervised algorithm to pre-train the intermediate layers by casting it as a multi-objective programming problem that is aimed at minimizing the average frame-level conditional entropy while maximizing the state occupation entropy. Empirical evaluation on a seven-language/dialect voice mail routing task showed that our approach can achieve a routing accuracy (RA) of 86.4% and average equal error rate (EER) of 6.6%. These results are significantly better than the 82.5% RA and 7.5% average EER obtained using the Gaussian mixture model trained with the maximum mutual information criterion but slightly worse than the 87.7% RA and 6.4% EER achieved using the support vector machine with model pushing on the Gaussian super vector (GSV).
Keywords
entropy; random processes; speech recognition; support vector machines; voice mail; Gaussian super vector; deep-structured conditional random fields; equal error rate; frame-level conditional entropy; frame-level marginal probabilities; language identification technique; language recognition; multiobjective programming problem; observation sequence; state occupation entropy; support vector machine; voice mail routing task; Automatic speech recognition; Casting; Entropy; Error analysis; Mutual information; Routing; Support vector machine classification; Support vector machines; Unsupervised learning; Voice mail; conditional random field; deep learning; deep-structure; language identification; unsupervised learning;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on
Conference_Location
Dallas, TX
ISSN
1520-6149
Print_ISBN
978-1-4244-4295-9
Electronic_ISBN
1520-6149
Type
conf
DOI
10.1109/ICASSP.2010.5495072
Filename
5495072
Link To Document