Title :
Speech/Music Classification of Short Audio Segments
Author_Institution :
Dolby Labs., Inc., Stockholm, Sweden
Abstract :
Research on speech/music classification of digital audio has been both popular in academia, and increasingly utilized in industry. Most of the usual methods use carefully hand-crafted features with Gaussian Mixture Models. To get best performance, some of the features necessitate a long latency due to look ahead, or/and a long onset error. This paper aims to have a different approach to the problem by exploring some of the latest trends in machine learning that have resulted in improvements in other fields. Specifically, it is shown that we can achieve comparable performance by only analyzing segments in the order of tens of milliseconds without the use of following or previous audio. This is done by using a method that allows automatic generation of arbitrarily many features from preprocessed spectrograms.
Keywords :
Gaussian processes; audio signal processing; feature extraction; learning (artificial intelligence); mixture models; signal classification; speech processing; Gaussian mixture model; automatic generation; digital audio segment; hand-crafted features; machine learning; music classification; spectrogram; speech classification; Accuracy; Encoding; Spectrogram; Speech; Speech recognition; Support vector machines; Training; audio classification; feature learning; sparse coding;
Conference_Titel :
Multimedia (ISM), 2014 IEEE International Symposium on
Conference_Location :
Taichung
Print_ISBN :
978-1-4799-4312-8
DOI :
10.1109/ISM.2014.27