DocumentCode
177993
Title
Motion history images for online speaker/signer diarization
Author
Gebre, Binyam Gebrekidan ; Wittenburg, Peter ; Heskes, Tom ; Drude, Sebastian
fYear
2014
fDate
4-9 May 2014
Firstpage
1537
Lastpage
1541
Abstract
We present a solution to the problem of online speaker/signer diarization - the task of determining who spoke/signed when?. Our solution is based on the idea that gestural activity (hands and body movement) is highly correlated with uttering activity. This correlation is necessarily true for sign languages and mostly true for spoken languages. The novel part of our solution is the use of motion history images (MHI) as a likelihood measure for probabilistically detecting uttering activities. MHI is an efficient representation of where and how motion occurred for a fixed period of time. We conducted experiments on 4.9 hours of the AMI meeting data and 1.4 hours of sign language dataset (Kata Kolok data). The best performance obtained is 15.70% for sign language and 31.90% for spoken language (measurements are in DER). These results show that our solution is applicable in real-world applications like video conferences and information retrieval.
Keywords
image motion analysis; image representation; maximum likelihood estimation; speaker recognition; AMI meeting data; Kata Kolok data; MHI; gestural activity; information retrieval; likelihood measure; motion history images; online speaker-signer diarization; sign language dataset; sign languages; spoken languages; uttering activity; video conferences; Assistive technology; Conferences; Density estimation robust algorithm; Gesture recognition; History; Speech; Speech processing; Speaker diarization; motion energy images; motion history images; signer diarization;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
Conference_Location
Florence
Type
conf
DOI
10.1109/ICASSP.2014.6853855
Filename
6853855
Link To Document