• DocumentCode
    2898582
  • Title

    Detection of Mouth Movements and its Applications to Cross-Modal Analysis of Planning Meetings

  • Author

    Xiong, Yingen ; Fang, Bing ; Quek, Francis

  • Author_Institution
    Nokia Res. Center, Palo Alto, CA, USA
  • Volume
    1
  • fYear
    2009
  • fDate
    18-20 Nov. 2009
  • Firstpage
    225
  • Lastpage
    229
  • Abstract
    Detection of meaningful meeting events is very important for cross-modal analysis of planning meetings. Many important events are related to speaker´s communication behavior. In visual-audio based speaker detection, mouth positions and movements are needed as visual information. We present our techniques to detect mouth positions and movements of a talking person in meetings. First, we build a skin color model with the Gaussian distribution. After training with skin color samples, we obtain parameters for the model. A skin color filter is created corresponding to the model with a threshold. We detect face regions for all participants in the meeting. Second, We create a mouth template and perform image matching to find candidates of the mouth in each face region. Next, according to the fact that the skin color in lip areas is different from other areas in the face region, by comparing dissimilarities of skin color between candidates and the original color model, we decide the mouth area from the candidates. Finally, we detect mouth movements by computing normalized cross-correlation coefficients of mouth area between two successive frames. A real-time system has been implemented to track speaker´s mouth positions and detection mouth movements. Applications also include video conferencing and improving human computer interaction (HCI). Examples in meeting environments and others are provided.
  • Keywords
    Gaussian distribution; image colour analysis; image matching; Gaussian distribution; cross-modal analysis; human computer interaction; image matching; mouth movements; mouth template; planning meetings; skin color filter; skin color model; speaker communication behavior; video conferencing; visual information; visual-audio based speaker detection; Event detection; Face detection; Filters; Gaussian distribution; Human computer interaction; Image matching; Meeting planning; Mouth; Real time systems; Skin;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Multimedia Information Networking and Security, 2009. MINES '09. International Conference on
  • Conference_Location
    Hubei
  • Print_ISBN
    978-0-7695-3843-3
  • Electronic_ISBN
    978-1-4244-5068-8
  • Type

    conf

  • DOI
    10.1109/MINES.2009.258
  • Filename
    5368362