Title :
Robust audiovisual integration using semicontinuous hidden Markov models
Author :
Su, Qin ; Silsbee, Peter L.
Author_Institution :
Dept. of Electr. & Comput. Eng., Old Dominion Univ., Norfolk, VA, USA
Abstract :
Describe an improved method of integrating audio and visual information in a hidden Markov model (HMM) based audio-visual automatic speech recognition (ASR) system. The method uses a modified semi-continuous HMM (SCHMM) for integration and recognition. Our results show substantial improvements over earlier integration methods at high noise levels. Our integration method relies on the assumption that, as environmental conditions deviate from those under which training occurred, the underlying probability distributions also change. We use phoneme-based SCHMMs for classification of isolated words. The probability models underlying the standard SCHMM are Gaussian; thus, low probability estimates tend to be associated with high confidences (small differences in the feature values cause large proportional differences in probabilities, when the values are in the tail of the distribution). Therefore, during classification, we replace each Gaussian with a scoring function which looks Gaussian near the mean of the distribution but has a heavier tail. We report results comparing this method with an audio-only system and with previous integration methods. At high noise levels, the system with modified scoring functions shows a better than 50% improvement; however, recognition does suffer when noise is low. Methods which can adjust the relative weight of the audio and visual information can still potentially outperform the new method, provided that a reliable way of choosing those weights can be found
Keywords :
Gaussian distribution; acoustic noise; audio-visual systems; hidden Markov models; pattern classification; speech recognition; Gaussian probability distribution; audio-visual automatic speech recognition; environmental conditions; feature values; isolate word classification; noise levels; phoneme-based semi-continuous hidden Markov models; relative audio/visual information weight; robust audio-visual integration; scoring function; training; Acoustic noise; Automatic speech recognition; Background noise; Distortion; Hidden Markov models; Noise level; Noise robustness; Probability distribution; Speech recognition; Working environment noise;
Conference_Titel :
Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on
Conference_Location :
Philadelphia, PA
Print_ISBN :
0-7803-3555-4
DOI :
10.1109/ICSLP.1996.607020