DocumentCode :
3723931
Title :
Enhancing the recognition of children´s speech on acoustically mismatched ASR system
Author :
S Shahnawazuddin;Hemant Kumar Kathania;Rohit Sinha
Author_Institution :
Department of Electronics and Electrical Engineering, Indian Institute of Technology Guwahati, 781039, India
fYear :
2015
Firstpage :
1
Lastpage :
5
Abstract :
The work presented in this paper explores the issues of recognizing children´s speech using acoustic models trained on adults´ speech data. In such conditions, on account of large acoustic mismatch between training and test data, a high degradation in the recognition performance is noted. In our earlier work, a binary weighting of cepstral features as well as of acoustic model parameters was explored to address the same. In this paper, a soft-weighting is proposed to overcome the information loss with simple binary weighting scheme. This is achieved through a low-rank projection learned using adults´ training data. The so derived transform happens to emphasize the principal dimensions of acoustic variations in adults´ speech. During testing, the transform maps children´s test data to the space of the training data and thus suppresses the mismatched dimensions. The proposed scheme is also verified experimentally using a recognition system trained on adults´ data only as well as another system trained using adults´ and children´s data pooled together. The effectiveness of acoustic model adaptation is also explored to further enhance the system performance. Combining SW with cluster model interpolation leads to a relative improvement of 14% over the baseline.
Keywords :
"Hidden Markov models","Covariance matrices","Computational modeling","Indexes","Measurement","Mel frequency cepstral coefficient","Matrix decomposition"
Publisher :
ieee
Conference_Titel :
TENCON 2015 - 2015 IEEE Region 10 Conference
ISSN :
2159-3442
Print_ISBN :
978-1-4799-8639-2
Electronic_ISBN :
2159-3450
Type :
conf
DOI :
10.1109/TENCON.2015.7373176
Filename :
7373176
Link To Document :
بازگشت