DocumentCode :
177626
Title :
Cultivating vocal activity detection for music audio signals in a circulation-type crowdsourcing ecosystem
Author :
Yoshii, Kazutomo ; Fujihara, Hiromasa ; Nakano, T. ; Goto, Misako
Author_Institution :
Nat. Inst. of Adv. Ind. Sci. & Technol. (AIST), Tsukuba, Japan
fYear :
2014
fDate :
4-9 May 2014
Firstpage :
624
Lastpage :
628
Abstract :
This paper presents a crowdsourcing-based self-improvement framework of vocal activity detection (VAD) for music audio signals. A standard approach to VAD is to train a vocal-and-non-vocal classifier by using labeled audio signals (training set) and then use that classifier to label unseen signals. Using this technique, we have developed an online music-listening service called Songle that can help users better understand music by visualizing automatically estimated vocal regions and pitches of arbitrary songs existing on the Web. The accuracy of VAD is limited, however, because in general the acoustic characteristics of the training set are different from those of real songs on the Web. To overcome this limitation, we adapt a classifier by leveraging vocal regions and pitches corrected by volunteer users. UnlikeWikipedia-type crowdsourcing, our Songle-based framework can amplify user contributions: error corrections made for a limited number of songs improve VAD for all songs. This gives better music listening experiences to all users as non-monetary rewards.
Keywords :
audio signal processing; electronic music; error correction; signal detection; speech processing; Songle-based framework; VAD; Web; Wikipedia-type crowdsourcing; acoustic characteristics; circulation-type crowdsourcing ecosystem; crowdsourcing-based self-improvement framework; error corrections; labeled audio signals; music audio signals; music listening experiences; online music-listening service; vocal activity detection; vocal-and-non-vocal classifier; Acoustics; Crowdsourcing; Estimation; Feature extraction; Harmonic analysis; Multiple signal classification; Speech; Music signal analysis; crowdsourcing; melody extraction; probabilistic models; vocal activity detection;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
Conference_Location :
Florence
Type :
conf
DOI :
10.1109/ICASSP.2014.6853671
Filename :
6853671
Link To Document :
بازگشت