Title :
Mispronunciation detection via dynamic time warping on deep belief network-based posteriorgrams
Author :
Ann Lee ; Yaodong Zhang ; Glass, James
Author_Institution :
Artificial Intell. Lab., MIT Comput. Sci., Cambridge, MA, USA
Abstract :
In this paper, we explore the use of deep belief network (DBN) posteriorgrams as input to our previously proposed comparison-based system for detecting word-level mispronunciation. The system works by aligning a nonnative utterance with at least one native utterance and extracting features that describe the degree of mis-alignment from the aligned path and the distance matrix. We report system performance under different DBN training scenarios: pre-training and fine-tuning with either native data only or both native and nonnative data. Experimental results have shown that by substituting the system input from MFCC or Gaussian posteriorgrams obtained in a fully unsupervised manner to DBN posteriorgrams, the system performance can be improved by at least 10.4% relatively. Moreover, the system performance remains steady when only 30% of the annotations being used.
Keywords :
Boltzmann machines; Gaussian processes; feature extraction; matrix algebra; speech processing; DBN posteriorgrams; Gaussian posteriorgrams; MFCC; deep belief network-based posteriorgrams; distance matrix; dynamic time warping; feature extraction; nonnative utterance; word-level mispronunciation detection; Feature extraction; Mel frequency cepstral coefficient; Speech; Speech recognition; Support vector machines; System performance; Training; deep belief networks; dynamic time warping; mispronunciation detection;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on
Conference_Location :
Vancouver, BC
DOI :
10.1109/ICASSP.2013.6639269