Title :
Speech Recognition With Flat Direct Models
Author :
Nguyen, Patrick ; Heigold, Georg ; Zweig, Geoffrey
Author_Institution :
Microsoft Res., Redmond, WA, USA
Abstract :
This paper describes a novel direct modeling approach for speech recognition. We propose a log-linear modeling framework based on using numerous features which each measure some form of consistency between the underlying speech and an entire sequence of hypothesized words. Since the model relates the entire audio signal to a complete hypothesis without necessarily positing any inherent structure, we term this a flat direct model (FDM). In contrast to a conventional hidden Markov model approach, no Markov assumptions are used, and the model is not necessarily sequential. We demonstrate the use of features based on both template-matching distances, and the acoustic detection of multi-phone units which are selected so as to have maximal mutual information with respect to word labels. Further, we solve the key problem of how to define features which can generalize to unseen word sequences. In the proposed model, template-based features improve sentence error rate by 3% absolute over the baseline, while multi-phone-based features improve by 2% absolute.
Keywords :
acoustic signal detection; hidden Markov models; pattern matching; speech recognition; acoustic detection; flat direct model; hidden Markov model; hypothesized word sequence; log-linear modeling framework; multiphone unit; speech recognition; template-based feature; template-matching distance; Acoustics; Feature extraction; Hidden Markov models; Markov processes; Mutual information; Speech recognition; Statistical learning; Direct model; features; log-linear model; maximum mutual information (MMI); speech recognition;
Journal_Title :
Selected Topics in Signal Processing, IEEE Journal of
DOI :
10.1109/JSTSP.2010.2080812