مرکز منطقه ای اطلاع رساني علوم و فناوري - Visual voice activity detection based on spatiotemporal information and bag of words

DocumentCode :

3707631

Title :

Visual voice activity detection based on spatiotemporal information and bag of words

Author :

Foteini Patrona;Alexandros Iosifidis;Anastasios Tefas;Nikolaos Nikolaidis;Ioannis Pitas

Author_Institution :

Department of Informatics, Aristotle University of Thessaloniki, Thessaloniki, Greece

fYear :

2015

Firstpage :

2334

Lastpage :

2338

Abstract :

A novel method for Visual Voice Activity Detection (V-VAD) that exploits local shape and motion information appearing at spatiotemporal locations of interest for facial region video description and the Bag of Words (BoW) model for facial region video representation is proposed in this paper. Facial region video classification is subsequently performed based on Single-hidden Layer Feedforward Neural (SLFN) network trained by applying the recently proposed kernel Extreme Learning Machine (kELM) algorithm on training facial videos depicting talking and non-talking persons. Experimental results on two publicly available V-VAD data sets, denote the effectiveness of the proposed method, since better generalization performance in unseen users is achieved, compared to recently proposed state-of-the-art methods.

Keywords :

"Training","Visualization","Speech","Feature extraction","Kernel","Spatiotemporal phenomena","Shape"

Publisher :

ieee

Conference_Titel :

Image Processing (ICIP), 2015 IEEE International Conference on

Type :

conf

DOI :

10.1109/ICIP.2015.7351219

Filename :

7351219

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3707631