Enhancing the recognition of children´s speech on acoustically mismatched ASR system

Author

S Shahnawazuddin;Hemant Kumar Kathania;Rohit Sinha

Author_Institution

Department of Electronics and Electrical Engineering, Indian Institute of Technology Guwahati, 781039, India

fYear

2015

Firstpage

1

Lastpage

5

Abstract

The work presented in this paper explores the issues of recognizing children´s speech using acoustic models trained on adults´ speech data. In such conditions, on account of large acoustic mismatch between training and test data, a high degradation in the recognition performance is noted. In our earlier work, a binary weighting of cepstral features as well as of acoustic model parameters was explored to address the same. In this paper, a soft-weighting is proposed to overcome the information loss with simple binary weighting scheme. This is achieved through a low-rank projection learned using adults´ training data. The so derived transform happens to emphasize the principal dimensions of acoustic variations in adults´ speech. During testing, the transform maps children´s test data to the space of the training data and thus suppresses the mismatched dimensions. The proposed scheme is also verified experimentally using a recognition system trained on adults´ data only as well as another system trained using adults´ and children´s data pooled together. The effectiveness of acoustic model adaptation is also explored to further enhance the system performance. Combining SW with cluster model interpolation leads to a relative improvement of 14% over the baseline.

Keywords

"Hidden Markov models","Covariance matrices","Computational modeling","Indexes","Measurement","Mel frequency cepstral coefficient","Matrix decomposition"

Publisher

ieee

Conference_Titel

TENCON 2015 - 2015 IEEE Region 10 Conference

ISSN

2159-3442

Print_ISBN

978-1-4799-8639-2

Electronic_ISBN

2159-3450

Type

conf

DOI

10.1109/TENCON.2015.7373176

Filename

7373176