• DocumentCode
    2935922
  • Title

    Detection of Inconsistency Between Subject and Speaker Based on the Co-occurrence of Lip Motion and Voice Towards Speech Scene Extraction from News Videos

  • Author

    Kumagai, Shogo ; Doman, Keisuke ; Takahashi, Tomokazu ; Deguchi, Daisuke ; Ide, Ichiro ; Murase, Hiroshi

  • Author_Institution
    Grad. Sch. of Inf. Sci., Nagoya Univ., Nagoya, Japan
  • fYear
    2011
  • fDate
    5-7 Dec. 2011
  • Firstpage
    311
  • Lastpage
    318
  • Abstract
    We propose a method to detect the inconsistency between a subject and the speaker for extracting speech scenes from news videos. Speech scenes in news videos contain a wealth of multimedia information, and are valuable as archived material. In order to extract speech scenes from news videos, there is an approach that uses the position and size of a face region. However, it is difficult to extract them with only such approach, since news videos contain non-speech scenes where the speaker is not the subject, such as narrated scenes. To solve this problem, we propose a method to discriminate between speech scenes and narrated scenes based on the co-occurrence between a subject´s lip motion and the speaker´s voice. The proposed method uses lip shape and degree of lip opening as visual features representing a subject´s lip motion, and uses voice volume and phoneme as audio feature representing a speaker´s voice. Then, the proposed method discriminates between speech scenes and narrated scenes based on the correlations of these features. We report the results of experiments on videos captured in a laboratory condition and also on actual broadcast news videos. Their results showed the effectiveness of our method and the feasibility of our research goal.
  • Keywords
    feature extraction; object detection; video signal processing; lip motion co-occurrence; lip opening degree; lip shape; news video; phoneme; speech scene extraction; subject-speaker inconsistency detection; voice co-occurrence; voice volume; Accuracy; Face; Feature extraction; Speech; Vectors; Videos; Visualization; audiovisual integration; correlation; lip motion; news videos; speech scene extraction;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Multimedia (ISM), 2011 IEEE International Symposium on
  • Conference_Location
    Dana Point CA
  • Print_ISBN
    978-1-4577-2015-4
  • Type

    conf

  • DOI
    10.1109/ISM.2011.56
  • Filename
    6123363