DocumentCode :
1864611
Title :
Improving automatic speech recognition in noise by energy normalization and signal resynthesis
Author :
Giurgiu, Mircea ; Kabir, Ahsanul
Author_Institution :
Dept. of Telecommun., Tech. Univ. of Cluj-Napoca, Cluj-Napoca, Romania
fYear :
2011
fDate :
25-27 Aug. 2011
Firstpage :
311
Lastpage :
314
Abstract :
This paper presents the contribution of energy normalization technique in automatic speech recognition in babble noise, where machine assumes that speech and noise have the same level of energy, therefore loudness. Similarly, loudness of target speech and noise is an important contributing factor while recognizing speech by humans in everyday conditions. Louder speech is better recognized than non louder speech by humans, even if they are approaching to the listeners at a same signal to noise ratio (SNR). This phenomenon has been tested over the machines and the recognition performance roughly varies from 75% to 90% across a wide range of SNRs. In exchange, human recognition performance is more SNR-dependent: it varies from 30% to 95%. By using energy normalization, the machines have a poor recognition rate in average in comparison to the performance of humans in less noisy conditions (positive SNR), but tend to outperform humans in high noisy conditions (negative SNR like -4dB, -6dB). It is also confirmed by this study that formant processing has no significant effect in recognizing speech in noise. Subsequently, it implies that formant based vocal tract length normalization is unable to improve the performance of machines in noise.
Keywords :
signal synthesis; speech recognition; automatic speech recognition; babble noise; energy normalization technique; formant based vocal tract length normalization; formant processing; human recognition performance; signal resynthesis; signal to noise ratio; Color; Humans; Noise measurement; Signal to noise ratio; Speech; Speech recognition; automatic speech recognition; computational modelling; energetic masking; energy normalization; informational masking; intelligibility; speech perception;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Intelligent Computer Communication and Processing (ICCP), 2011 IEEE International Conference on
Conference_Location :
Cluj-Napoca
Print_ISBN :
978-1-4577-1479-5
Electronic_ISBN :
978-1-4577-1481-8
Type :
conf
DOI :
10.1109/ICCP.2011.6047886
Filename :
6047886
Link To Document :
بازگشت