Author_Institution :
Dept. of Comput. Eng., Baskent Univ., Ankara, Turkey
Abstract :
Audio data contains several sounds and is an important source for multimedia applications. One of them is unstructured Environmental Sounds (also referred to as audio events) that have noise-like characteristics with flat spectrums. Therefore, in general, recognition methods applied for music and speech data are not appropriate for the Environmental Sounds. In this paper, we propose an MFCC-SVM based approach that exploits the effect of feature representation and learner optimization tasks for efficient recognition of audio events from audio signals. The proposed approach considers efficient representation of MFCC features using different window and hop sizes by changing the number of Mel coefficients in the analyses as well as optimizing the SVM parameters. Moreover, 16 different audio events from the IEEE Audio and Acoustic Signal Processing (AASP) Challenge Dataset, namely alert, clear throat, cough, door slam, drawer, keyboard, keys, knock, laughter, mouse, page turn, pen drop, phone, printer, speech, and switch that are collected from office live environments are utilized in the evaluations. Our empirical evaluations show that, when the results of the proposed methods are chosen for MFFC feature and SVM classifier, the tests conducted through using 5-fold cross validation gives the results of 62%, 58% and 55% for Precision, Recall and F-measure scores, respectively. Extensive experiments on audio-based event detection using the IEEE AASP Challenge dataset show the effectiveness of the proposed approach.
Keywords :
acoustic signal processing; audio signal processing; cepstral analysis; office environment; optimisation; pattern classification; speech recognition; support vector machines; AASP Challenge dataset; IEEE Audio and Acoustic Signal Processing Challenge dataset; MFCC feature representation; Mel coefficients; Mel frequency cepstral coefficient; SVM classifier; SVM parameter optimization; audio data; audio event recognition; audio signals; audio-based event detection; learner optimization tasks; office live environments; optimized MFCC-SVM approach; speech data; unstructured environmental sounds; Classification algorithms; Keyboards; Mice; Printers; Speech; Support vector machines; Switches; MFCC; SVM; audio event detection; semantic computing;