Title :
Contrastive auto-encoder for phoneme recognition
Author :
Xin Zheng ; Zhiyong Wu ; Meng, Hsiang-Yun ; Lianhong Cai
Author_Institution :
Tsinghua-CUHK Joint Res. Center for Media Sci., Tsinghua Univ., Shenzhen, China
Abstract :
Speech data typically contains task irrelevant information lying within features. Specifically, phonetic information, speaker characteristic information, emotional information and noise are always mixed together and tend to impair one another for certain task. We propose a new type of auto-encoder for feature learning called contrastive auto-encoder. Unlike other variants of auto-encoders, contrastive auto-encoder is able to leverage class labels in constructing its representation layer. We achieve this by modeling two autoencoders together and making their differences contribute to the total loss function. The transformation built with contrastive auto-encoder can be seen as a task-specific and invariant feature learner. Our experiments on TIMIT clearly show the superiority of the feature extracted from contrastive auto-encoder over original acoustic feature, feature extracted from deep auto-encoder, and feature extracted from a model that contrastive auto-encoder originates from.
Keywords :
feature extraction; speech processing; speech recognition; acoustic feature; contrastive auto-encoder; emotional information; feature extraction; invariant feature learner; phoneme recognition; phonetic information; representation layer; speaker characteristic information; speech data; task irrelevant information; task-specific learner; Acoustics; DNA; Data models; Feature extraction; Hidden Markov models; Neurons; Training; auto-encoder; contrastive auto-encoder; deep neural network (DNN); phoneme recognition; restricted Boltzmann machine (RBM);
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
Conference_Location :
Florence
DOI :
10.1109/ICASSP.2014.6854056