Title :
Discriminative Training Using Non-Uniform Criteria for Keyword Spotting on Spontaneous Speech
Author :
Chao Weng ; Juang, Biing-Hwang Fred
Author_Institution :
Dept. of Electr. & Comput. Eng., Georgia Inst. of Technol., Atlanta, GA, USA
Abstract :
In this work, we formulate the problem of keyword spotting as a non-uniform error automatic speech recognition (ASR) problem and propose a model training methodology based on the non-uniform minimum classification error (MCE) approach. The main idea is to adapt the fundamental MCE criteria to reflect the cost-sensitive notion in that errors on keywords are much more significant than errors on non-keywords in an automatic speech recognition task. The notion of cost sensitivity leads to emphasis of keyword models in parameter optimization. Then we present a system which takes advantage of the weighted finite-state transducer (WFST) framework to efficiently implement the non-uniform MCE. To enhance the approach of non-uniform error cost minimization for keyword spotting, we further formulate a technique called ”adaptive boosted non-uniform MCE” which incorporates the idea of boosting. We validate the proposed framework on two challenging large-scale spontaneous conversational telephone speech (CTS) datasets in two different languages (English and Mandarin). Experimental results show our framework can achieve consistent and significant spotting performance gains over both the maximum likelihood estimation (MLE) baseline and conventional discriminatively-trained systems with uniform error cost.
Keywords :
finite state machines; optimisation; signal classification; speech recognition; ASR problem; CTS datasets; MCE approach; MLE; WFST framework; adaptive boosted nonuniform MCE technique; cost sensitivity; discriminative training; discriminatively-trained systems; keyword models; keyword spotting problem; large-scale spontaneous conversational telephone speech datasets; maximum likelihood estimation; nonuniform criteria; nonuniform error automatic speech recognition problem; nonuniform error cost minimization; nonuniform minimum classification error approach; parameter optimization; spontaneous speech; uniform error cost; weighted finite-state transducer; Hidden Markov models; Linear programming; Speech; Speech processing; Speech recognition; Training; Vocabulary; Discriminative training (DT); keyword spotting; minimum classification error (MCE); non-uniform criteria; weighted finite-state transducer (WFST);
Journal_Title :
Audio, Speech, and Language Processing, IEEE/ACM Transactions on
DOI :
10.1109/TASLP.2014.2381931