• DocumentCode
    691995
  • Title

    Multi-modal Voice Activity Detection by Embedding Image Features into Speech Signal

  • Author

    Abe, Y. ; Ito, Akinori

  • Author_Institution
    Grad. Sch. of Eng., Tohoku Univ., Sendai, Japan
  • fYear
    2013
  • fDate
    16-18 Oct. 2013
  • Firstpage
    271
  • Lastpage
    274
  • Abstract
    Lip movement has a close relationship with speech because the lips move when we talk. The idea behind this work is to extract the lip movement feature from the facial video and embed the movement feature into speech signal using information hiding technique. Using the proposed framework, we can provide advanced speech communication only using the speech signal that includes lip movement features, without increasing the bitrate of the signal. In this paper, we show the basic framework of the method and apply the proposal method to multi-modal voice activity detection (VAD). As a result of detection experiment using the support vector machine, we obtained better performance than the audio-only VAD in a noisy environment. In addition, we investigated how data embedding into speech signal affects sound quality and detection performance.
  • Keywords
    feature extraction; image recognition; speech processing; video signal processing; voice communication; facial video; image feature embedding; lip movement feature; multimodal voice activity detection; speech communication; speech signal; Bit rate; Feature extraction; Noise; Speech; Speech recognition; Support vector machines; Visualization; audio-visual; information hiding; multi-modal; voice activity detection (VAD);
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Intelligent Information Hiding and Multimedia Signal Processing, 2013 Ninth International Conference on
  • Conference_Location
    Beijing
  • Type

    conf

  • DOI
    10.1109/IIH-MSP.2013.76
  • Filename
    6846632