DocumentCode :
177664
Title :
Segmentation of music video streams in music pieces through audio-visual analysis
Author :
Sargent, Gabriel ; Hanna, Philip ; Nicolas, H.
Author_Institution :
LaBRI, Univ. de Bordeaux, Talence, France
fYear :
2014
fDate :
4-9 May 2014
Firstpage :
724
Lastpage :
728
Abstract :
Today, technologies for information storage and transmission allow the creation and development of huge databases of multimedia content. Tools are needed to facilitate their access and browsing. In this context, this article focuses on the segmentation of a particular category of multimedia content, audio-visual musical streams, into music pieces. This category includes concert audio-video recordings, and sequences of music videos such as the ones found in musical TV channels. Current approaches consist in supervised clustering in a few audio classes (music, speech, noise), and, to our knowledge, no consistent evaluation has been performed yet in the case of audio-visual musical streams. In this paper, we aim at estimating the temporal boundaries of music pieces relying on the assumed homogeneity of their musical and visual properties. We consider an unsupervised approach based on the generalized likelihood ratio to evaluate the presence of statistical breakdowns of MFCCs, Chroma vectors, dominant Hue and Lightness over time. An evaluation of this approach on 15 manually annotated concert streams shows the advantage of combining tonal content features to timbral ones, and a modest impact from the joint use of visual features in boundary estimation.
Keywords :
audio recording; audio-visual systems; cepstral analysis; edge detection; image segmentation; image sequences; multimedia databases; music; pattern clustering; unsupervised learning; video recording; video streaming; MFCC; audio classes; audio-visual analysis; audio-visual musical stream; chroma vector; concert audio-video recording; generalized likelihood ratio; information storage; information transmission; multimedia content database; multimedia content segmentation; music pieces; music video sequence; music video stream segmentation; statistical breakdown; supervised clustering; temporal boundary estimation; unsupervised approach; visual feature; Electric breakdown; Mel frequency cepstral coefficient; Music; Speech; Streaming media; Vectors; Visualization; Multimedia signal processing; audio-visual stream; music video; segmentation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
Conference_Location :
Florence
Type :
conf
DOI :
10.1109/ICASSP.2014.6853691
Filename :
6853691
Link To Document :
بازگشت