• DocumentCode
    1995742
  • Title

    VidTIMIT audio visual phoneme recognition using AAM visual features and human auditory motivated acoustic wavelet features

  • Author

    Biswas, Astik ; Sahu, P.K. ; Bhowmick, Anirban ; Chandra, Mahesh

  • Author_Institution
    Dept. of Electr. Eng., Nat. Inst. of Technol., Rourkela, India
  • fYear
    2015
  • fDate
    9-11 July 2015
  • Firstpage
    428
  • Lastpage
    433
  • Abstract
    This paper presents an audio visual phoneme recognition system using the shape and appearance information extracted from jaw and lip region to enhance the robustness in noisy environment. Consideration of visual features along with traditional acoustic features have been found to be promising in complex auditory environment. Visual modality can provide complementary information to the speech recognizer when the audio modality is badly affected by background noise. Acoustic modality is represented by auditory based equivalent rectangular bandwidth (ERB) like wavelet features (WERBC) features, whereas visual modality is represented by statistically powerful active appearance model (AAM) based features. Audio and visual modalities are fused by using a proportional weighting factor to form the two stream audio visual synchronous Hidden Markov Model (SHMM) recognizer. The VidTIMIT database is chosen to study the performance of multi-modal phoneme recognition system. Artificial noises are injected to audio files at different SNR levels (0dB-20dB) to study the performance of system in noisy environment. Combination of WERBC and AAM features outperform the well known traditional combination of Mel scale cepstrum coefficients (MFCC) acoustic features and discrete cosine transform (DCT) visual features.
  • Keywords
    audio signal processing; hidden Markov models; speech recognition; AAM based features; AAM feature; AAM visual features; SHMM recognizer; VidTIMIT audio visual phoneme recognition; VidTIMIT database; WERBC feature; active appearance model based features; artificial noises; audio modality; audio visual phoneme recognition system; auditory based equivalent rectangular bandwidth; human auditory motivated acoustic wavelet features; multimodal phoneme recognition system; proportional weighting factor; speech recognizer; stream audio visual synchronous hidden Markov model; visual modality; Acoustics; Active appearance model; Feature extraction; Hidden Markov models; Shape; Speech; Visualization; AAM; Audio visual phoneme recognition; HCI; Vid-TIMIT Corpus; WERBC;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Recent Trends in Information Systems (ReTIS), 2015 IEEE 2nd International Conference on
  • Conference_Location
    Kolkata
  • Type

    conf

  • DOI
    10.1109/ReTIS.2015.7232917
  • Filename
    7232917