Title :
Query-by-example keyword spotting using long short-term memory networks
Author :
Guoguo Chen ; Parada, Carolina ; Sainath, Tara N.
Author_Institution :
Center for Language & Speech Process., Johns Hopkins Univ., Baltimore, MD, USA
Abstract :
We present a novel approach to query-by-example keyword spotting (KWS) using a long short-term memory (LSTM) recurrent neural network-based feature extractor. In our approach, we represent each keyword using a fixed-length feature vector obtained by running the keyword audio through a word-based LSTM acoustic model. We use the activations prior to the softmax layer of the LSTM as our keyword-vector. At runtime, we detect the keyword by extracting the same feature vector from a sliding window and computing a simple similarity score between this test vector and the keyword vector. With clean speech, we achieve 86% relative false rejection rate reduction at 0.5% false alarm rate when compared to a competitive phoneme posteriorgram with dynamic time warping KWS system, while the reduction in the presence of babble noise is 67%. Our system has a small memory footprint, low computational cost, and high precision, making it suitable for on-device applications.
Keywords :
acoustic signal processing; audio signal processing; feature extraction; query processing; recurrent neural nets; speech processing; LSTM recurrent neural network-based feature extractor; babble noise; dynamic time warping KWS system; fixed-length feature vector; long short-term memory network; query-by-example keyword spotting; relative false rejection rate reduction; sliding window; softmax layer; word-based LSTM acoustic model; Acoustics; Computational modeling; Feature extraction; Hidden Markov models; Noise; Speech; Speech processing;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on
Conference_Location :
South Brisbane, QLD
DOI :
10.1109/ICASSP.2015.7178970