Title :
Unsupervised neural network based feature extraction using weak top-down constraints
Author :
Kamper, Herman ; Elsner, Micha ; Jansen, Aren ; Goldwater, Sharon
Author_Institution :
Sch. of Inf., Univ. of Edinburgh, Edinburgh, UK
Abstract :
Deep neural networks (DNNs) have become a standard component in supervised ASR, used in both data-driven feature extraction and acoustic modelling. Supervision is typically obtained from a forced alignment that provides phone class targets, requiring transcriptions and pronunciations. We propose a novel unsupervised DNN-based feature extractor that can be trained without these resources in zero-resource settings. Using unsupervised term discovery, we find pairs of isolated word examples of the same unknown type; these provide weak top-down supervision. For each pair, dynamic programming is used to align the feature frames of the two words. Matching frames are presented as input-output pairs to a deep autoencoder (AE) neural network. Using this AE as feature extractor in a word discrimination task, we achieve 64% relative improvement over a previous state-of-the-art system, 57% improvement relative to a bottom-up trained deep AE, and come to within 23% of a supervised system.
Keywords :
feature extraction; neural nets; speech recognition; acoustic modelling; autoencoder neural network; automatic speech recognition; data-driven feature extraction; deep neural networks; feature extractor; top-down constraints; unsupervised neural network; word discrimination task; Artificial neural networks; Feature extraction; Gold; Speech; Standards; Training; Unsupervised feature extraction; deep neural networks; top-down constraints; zero-resource speech processing;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on
Conference_Location :
South Brisbane, QLD
DOI :
10.1109/ICASSP.2015.7179087