DocumentCode
3493718
Title
Unsupervised detection of multimodal clusters in edited recordings
Author
Dielmann, Alfred
Author_Institution
IDIAP Res. Inst., Martigny, Switzerland
fYear
2010
fDate
4-6 Oct. 2010
Firstpage
177
Lastpage
182
Abstract
Edited video recordings, such as talk-shows and sitcoms, often include Audio-Visual clusters: frequent repetitions of closely related acoustic and visual content. For example during a political debate, every time that a given participant holds the conversational floor, her/his voice tends to co-occur with camera views (i.e. shots) showing her/his portrait. Differently from the previous Audio-Visual clustering works, this paper proposes an unsupervised approach that detects Audio-Visual clusters, avoiding to make assumptions on the recording content, such as the presence of specific participant voices or faces. Sequences of audio and shot clusters are automatically identified using unsupervised audio diarization and shot segmentation techniques. Audio-Visual clusters are then formed by ranking the co-occurrences between these two segmentations and selecting those which significantly go beyond chance. Numerical experiments performed on a collection of 70 political debates, comprising more than 43 hours of live edited recordings, showed that automatically extracted AudioVisual clusters well match the ground-truth annotation, achieving high purity performances.
Keywords
audio-visual systems; pattern clustering; audio visual clustering; edited recording; multimodal cluster; unsupervised detection; Cameras; Gold; Hidden Markov models; Irrigation; Manuals; Measurement; Visualization;
fLanguage
English
Publisher
ieee
Conference_Titel
Multimedia Signal Processing (MMSP), 2010 IEEE International Workshop on
Conference_Location
Saint Malo
Print_ISBN
978-1-4244-8110-1
Electronic_ISBN
978-1-4244-8111-8
Type
conf
DOI
10.1109/MMSP.2010.5662015
Filename
5662015
Link To Document