DocumentCode
226408
Title
The optimization of PLP feature extraction for LVCSR recognition of MP3 data
Author
Borsky, Michal ; Pollak, Petr
Author_Institution
Fac. of Electr. Eng., Czech Tech. Univ. in Prague, Prague, Czech Republic
fYear
2014
fDate
9-10 Sept. 2014
Firstpage
55
Lastpage
58
Abstract
This paper analyses the contribution of optimized PLP feature extraction setup and application of feature normalization to improve the performance of automatic speech recognition system for data compressed by MP3 algorithm. The experimental study performed on loop-digit recognition and large vocabulary continues speech recognition task showed that proper setup can negate the effect of lower compression rates which can achieve results comparable with higher rates. The second finding is that the normalization techniques contribute significantly to overall performance, especially for shorter windows/shifts and lower compression rates. The acoustic models trained on 160kbits/s, 32kbits/s and 16kbits/s data performed at 34.17%, 41.88% and 36.4% WER respectively on LVCSR task. In comparison the non-compressed acoustic models performed at 28.56% WER.
Keywords
acoustic signal processing; data compression; feature extraction; speech coding; speech recognition; vocabulary; MP3 algorithm; MP3 data LVCSR recognition; PLP feature extraction optimization; WER; automatic speech recognition system; data compression; feature normalization technique; loop-digit recognition; noncompressed acoustic model; word error rate; Acoustic distortion; Acoustics; Digital audio players; Feature extraction; Speech; Speech recognition; LVCSR; MP3 compression; PLP features; acoustic modelling; cepstral mean and variance normalization; cepstral mean normalization;
fLanguage
English
Publisher
ieee
Conference_Titel
Applied Electronics (AE), 2014 International Conference on
Conference_Location
Pilsen
ISSN
1803-7232
Print_ISBN
978-8-0261-0276-2
Type
conf
DOI
10.1109/AE.2014.7011667
Filename
7011667
Link To Document