Title :
Leveraging structural information in music-speech dectection
Author :
Jinyu Han ; Coover, Bob
Author_Institution :
Media Technol. Lab., Gracenote, Emeryville, CA, USA
Abstract :
Detecting music or speech signals in an audio mixture is an important but challenging problem. Even more challenging is detecting when both are present in a signal at the same time. This problem requires not only discriminating speech or music from each other but also detecting its presence in a mixture with interfering signals. In this paper, we address the problem of detecting speech and music signals in the presence of each other. We focus on leveraging features that capture the structural properties of audio to improve the performance of concurrent music-speech detection. Continuous Frequency Activation (CFA) is used to account for the sustained pitch/harmonic activities, and a new feature called Transient Activation (TAC) is proposed for the transient/percussive activities in an audio signal. The effectiveness of these features along with other acoustic features is evaluated in different statistical classification schemes. Feature selection is conducted to select the best feature set to maximize the detection performance. Experimental results on real world broadcast recordings have shown significant improvement by using the above techniques to incorporate the structural information of audio.
Keywords :
audio signal processing; music; speech processing; statistical analysis; CFA; TAC; acoustic features; audio mixture; audio signal; audio structural information; audio structural properties; continuous frequency activation; interfering signals; leveraging structural information; music signal detection; music speech detection; pitch harmonic activities; speech signal detection; speech signals; statistical classification schemes; transient activation; Entropy; Feature extraction; Harmonic analysis; Multiple signal classification; Spectrogram; Speech; Transient analysis; SVM; acoustical signal detection; audio classification; music/speech detection; pitch; transient;
Conference_Titel :
Multimedia and Expo Workshops (ICMEW), 2013 IEEE International Conference on
Conference_Location :
San Jose, CA
DOI :
10.1109/ICMEW.2013.6618387