Analysis of cross-gender adaptation using MAP and MLLR in speech recognition systems

Author

Mahiba, S. Magdalene ; Christina, S. Lilly ; Vijayalakshmi, P. ; Nagarajan, T.

Author_Institution

SSN Coll. of Eng., Chennai, India

fYear

2013

fDate

25-27 July 2013

Firstpage

387

Lastpage

392

Abstract

Speech recognition system developed with context-dependent phonemes captures the co-articulation effect and it gives a better performance compared to systems developed with context-independent units. However the performance of the system is also dependent on the speaker. Speaker dependence of the recognition system arises from the speaker-dependent speech features. The variation of the vocal tract length and! shape is the major cause for this inter-speaker variation. Thus the performance of speaker-independent (SI) systems is surpassed by speaker-dependent (SD) systems. It is well established in the literature that the recognition performance of the SI system can be improved to the standards of an SD system by speaker adaptation. The main focus in this paper revolves around the analysis on the amount and ratio of male and female training data for which the cross-gender speaker adaptation gives higher performance. The speaker adaptation cechniques MAP and MLLR are implemented, using the TIMIT speech corpus. It is observed that MLLR adapts the model parameters better than MAP even with 24s of adaptation data. It is also inferred that training the system with both male and female data results in better cross-gender adaptation performance, when compared with the system trained with a either male or female data, primarily because the system parameters differ greatly for male and female speakers. The overall recognition performance of the context-dependent system is improved by 0.55% for MAP adaptation and 2.75% for MLLR adaptation over the unadapted recognition system, for the minimal amount of data.

Keywords

maximum likelihood estimation; regression analysis; speaker recognition; MAP adaptation; MLLR adaptation; SD system; SI system; coarticulation effect; context-dependent phonemes; cross-gender speaker adaptation analysis; female training data; inter-speaker variation; male training data; maximum a posteriori algorithm; maximum likelihood linear regression; speaker-dependent sgstems; speaker-dependent speech features; speaker-independent sνstems; speech recognition systems; vocal tract length; vocal tract shape; Adaptation models; Data models; Market research; Silicon; Speech; Speech recognition; Training; MAP; MLLR; Speaker adaptation;

fLanguage

English

Publisher

ieee

Conference_Titel

Recent Trends in Information Technology (ICRTIT), 2013 International Conference on

Conference_Location

Chennai

Type

conf

DOI

10.1109/ICRTIT.2013.6844235

Filename

6844235