Title :
Shout detection in noise
Author :
Pohjalainen, Jouni ; Alku, Paavo ; Kinnunen, Tomi
Author_Institution :
Dept. of Signal Process. & Acoust., Aalto Univ., Espoo, Finland
Abstract :
For the task of detecting shouted speech in a noisy environment, this paper introduces a system based on mel frequency cepstral coefficient (MFCC) feature extraction, unsupervised frame dropping and Gaussian mixture model (GMM) classification. The evaluation material consists of phonemically identical speech and shouting as well as environmental noise of varying levels. The performance of the shout detection system is analyzed by varying the MFCC feature extraction with respect to 1) the feature vector length and 2) the spectrum estimation method. As for feature vector length, the best performance is obtained using 30 MFCC coefficients, which is more than what is conventionally used. In spectrum estimation, a scheme that combines a linear prediction spectrum envelope with spectral fine structure outperforms the conventional FFT.
Keywords :
Gaussian processes; feature extraction; speech recognition; FFT; GMM classification; Gaussian mixture model; MFCC; feature extraction; linear prediction spectrum; mel frequency cepstral coefficient; shouted speech detection; unsupervised frame dropping; Feature extraction; Hidden Markov models; Materials; Mel frequency cepstral coefficient; Signal to noise ratio; Speech; shout detection;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on
Conference_Location :
Prague
Print_ISBN :
978-1-4577-0538-0
Electronic_ISBN :
1520-6149
DOI :
10.1109/ICASSP.2011.5947471