DocumentCode
542219
Title
Enhanced posteriors bias prediction for robust multi-stream ASR combining voicing and estimate reliabilities
Author
Glotin, Hervé
Author_Institution
ERSS - CNRS, 5 all. Machado; Toulouse Cedex 1 - France
Volume
1
fYear
2002
fDate
13-17 May 2002
Abstract
We discuss the fusion of speech and phoneme estimate reliabilities in a multi-stream Automatic Speech Recognizer (ASR) to improve ASR robustness. The Full Combination approach (FC) proposes to decompose the full-band posterior probability for each phoneme into a reliability weighted sum of stream posteriors´ combinations. Previous studies show that weighting factors in FC should take in account not only speech signal reliability, but also the intrinsic efficiency of subband experts. To control these two variables for each combination of posteriors we derive a new model called “Posteriors Bias Prediction” (PBP) inspired by the Shannon Correction system. We show that FC is a specific type of PBP, and that PBP allows the integration of stream reliability based on of the voicing level R (Correlated with the Signal to Noise Ratio) and the phoneme´s class. Tests on telephonic free digits (Numbers95) under various noise and SNR level demonstrate that PBP- outperforms FC, Jrasta or Spectral Subtraction methods.
Keywords
Adaptation model; Hidden Markov models; Robustness; Signal to noise ratio; Speech processing; Speech recognition;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech, and Signal Processing (ICASSP), 2002 IEEE International Conference on
Conference_Location
Orlando, FL, USA
ISSN
1520-6149
Print_ISBN
0-7803-7402-9
Type
conf
DOI
10.1109/ICASSP.2002.5743717
Filename
5743717
Link To Document