Title :
Making talking heads and speechreading with computers
Author :
Brooke, N.M. ; Scott, S.D. ; Tomlinson, M.J.
Author_Institution :
Media Technol. Res. Centre, Bath Univ., UK
Abstract :
Seeing the face of a speaker has been shown to be equivalent to an increase of 8-12 dB in the SNR of speech presented in a noise background. At moderate noise levels, this can be significant. The cues that the visible facial movements convey are essentially cues to the place of articulation, which are acoustically precisely the cues most rapidly degraded by noise. Conversely, the cues to the manner of articulation are often not visible, but are however acoustically robust, even in noise. Visible and audible speech cues are therefore largely complementary. Increasingly widespread efforts are now being made to explore the benefits of exploiting visible, facial speech movements in automatic speech processing systems. For example, automatic speech recognition in noisy environments may be enhanced by augmenting the acoustic inputs to conventional recognizers with data about the visible speech gestures. Aircraft cockpits are just one demanding environment where reliable speech recognition is becoming important. Conversely, video speech synthesis, i.e. the computer graphics synthesis of talking heads, might help in the investigation, assessment and improvement of the speech-reading skills that are so important to the hearing-impaired. Realistic animated graphical displays of visible facial speech articulations could be used for the interactive presentation of controllable and flexible visual stimuli in response to a learner´s progress in speech-reading. At present, training relies largely upon live speakers who cannot easily control their gestures, or pre-recorded libraries of images which cannot easily be altered
Keywords :
handicapped aids; acoustic inputs; acoustically robust cues; aircraft cockpits; animated graphical displays; automatic speech processing systems; computer graphics; computers; facial speech movements; hearing-impaired people; interactive presentation; noise background; speech cues; speech recognition; speech-reading; talking heads; training; video speech synthesis; visible facial movements; visible facial speech articulations; visible speech gestures; visual cues; visual stimuli;
Conference_Titel :
Integrated Audio-Visual Processing for Recognition, Synthesis and Communication (Digest No: 1996/213), IEE Colloquium on
Conference_Location :
London
DOI :
10.1049/ic:19961146