DocumentCode
1695002
Title
Mispronunciation detection via dynamic time warping on deep belief network-based posteriorgrams
Author
Ann Lee ; Yaodong Zhang ; Glass, James
Author_Institution
Artificial Intell. Lab., MIT Comput. Sci., Cambridge, MA, USA
fYear
2013
Firstpage
8227
Lastpage
8231
Abstract
In this paper, we explore the use of deep belief network (DBN) posteriorgrams as input to our previously proposed comparison-based system for detecting word-level mispronunciation. The system works by aligning a nonnative utterance with at least one native utterance and extracting features that describe the degree of mis-alignment from the aligned path and the distance matrix. We report system performance under different DBN training scenarios: pre-training and fine-tuning with either native data only or both native and nonnative data. Experimental results have shown that by substituting the system input from MFCC or Gaussian posteriorgrams obtained in a fully unsupervised manner to DBN posteriorgrams, the system performance can be improved by at least 10.4% relatively. Moreover, the system performance remains steady when only 30% of the annotations being used.
Keywords
Boltzmann machines; Gaussian processes; feature extraction; matrix algebra; speech processing; DBN posteriorgrams; Gaussian posteriorgrams; MFCC; deep belief network-based posteriorgrams; distance matrix; dynamic time warping; feature extraction; nonnative utterance; word-level mispronunciation detection; Feature extraction; Mel frequency cepstral coefficient; Speech; Speech recognition; Support vector machines; System performance; Training; deep belief networks; dynamic time warping; mispronunciation detection;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on
Conference_Location
Vancouver, BC
ISSN
1520-6149
Type
conf
DOI
10.1109/ICASSP.2013.6639269
Filename
6639269
Link To Document