DocumentCode :
1518716
Title :
Gammatone Cepstral Coefficients: Biologically Inspired Features for Non-Speech Audio Classification
Author :
Valero, Xavier ; Alias, Francesc
Author_Institution :
GTM-Grup de Recerca en Tecnologies Media, La Salle-Univ. Ramon Llull, Barcelona, Spain
Volume :
14
Issue :
6
fYear :
2012
Firstpage :
1684
Lastpage :
1689
Abstract :
In the context of non-speech audio recognition and classification for multimedia applications, it becomes essential to have a set of features able to accurately represent and discriminate among audio signals. Mel frequency cepstral coefficients (MFCC) have become a de facto standard for audio parameterization. Taking as a basis the MFCC computation scheme, the Gammatone cepstral coefficients (GTCCs) are a biologically inspired modification employing Gammatone filters with equivalent rectangular bandwidth bands. In this letter, the GTCCs, which have been previously employed in the field of speech research, are adapted for non-speech audio classification purposes. Their performance is evaluated on two audio corpora of 4 h each (general sounds and audio scenes), following two cross-validation schemes and four machine learning methods. According to the results, classification accuracies are significantly higher when employing GTCC rather than other state-of-the-art audio features. As a detailed analysis shows, with a similar computational cost, the GTCC are more effective than MFCC in representing the spectral characteristics of non-speech audio signals, especially at low frequencies.
Keywords :
audio signal processing; cepstral analysis; filtering theory; learning (artificial intelligence); multimedia computing; signal classification; GTCC; MFCC; Mel frequency cepstral coefficients; audio corpora; audio parameterization; audio scenes; biologically-inspired features; computational cost; cross-validation schemes; de-facto standard; gammatone cepstral coefficients; gammatone filters; general sounds; machine learning methods; multimedia applications; nonspeech audio classification; nonspeech audio recognition; performance evaluation; rectangular bandwidth bands; spectral characteristics; Bandwidth; Computational efficiency; Filter banks; Humans; Mel frequency cepstral coefficient; Audio classification; Gammatone cepstral coefficients; audio scene recognition; environmental sound; feature extraction;
fLanguage :
English
Journal_Title :
Multimedia, IEEE Transactions on
Publisher :
ieee
ISSN :
1520-9210
Type :
jour
DOI :
10.1109/TMM.2012.2199972
Filename :
6202347
Link To Document :
بازگشت