Title :
Leveraging speech production knowledge for improved speech recognition
Author :
Sangwan, Abhijeet ; Hansen, John H L
Author_Institution :
Center for Robust Speech Syst. (CRSS), Univ. of Texas at Dallas (UTD), Richardson, TX, USA
fDate :
Nov. 13 2009-Dec. 17 2009
Abstract :
This study presents a novel phonological methodology for speech recognition based on phonological features (PFs) which leverages the relationship between speech phonology and phonetics. In particular, the proposed scheme estimates the likelihood of observing speech phonology given an associative lexicon. In this manner, the scheme is capable of choosing the most likely hypothesis (word candidate) among a group of competing alternative hypotheses. The framework employs the maximum entropy (ME) model to learn the relationship between phonetics and phonology. Subsequently, we extend the ME model to a ME-HMM (maximum entropy-hidden Markov model) which captures the speech production and linguistic relationship between phonology and words. The proposed ME-HMM model is applied to the task of re-processing N-best lists where an absolute WRA (word recognition rate) increase of 1.7%, 1.9% and 1% are reported for TIMIT, NTIMIT, and the SPINE (speech in noise) corpora (15.5% and 22.5% relative reduction in word error rate for TIMIT and NTIMIT).
Keywords :
hidden Markov models; maximum entropy methods; speech recognition; maximum entropy-hidden Markov model; phonological features; speech in noise corpora; speech phonetics; speech phonology; speech production knowledge; speech recognition; word recognition rate; Automatic speech recognition; Entropy; Error analysis; Government; Hidden Markov models; Noise reduction; Resonance; Robustness; Speech enhancement; Speech recognition;
Conference_Titel :
Automatic Speech Recognition & Understanding, 2009. ASRU 2009. IEEE Workshop on
Conference_Location :
Merano
Print_ISBN :
978-1-4244-5478-5
Electronic_ISBN :
978-1-4244-5479-2
DOI :
10.1109/ASRU.2009.5373368