Title :
Analysis of cross-gender adaptation using MAP and MLLR in speech recognition systems
Author :
Mahiba, S. Magdalene ; Christina, S. Lilly ; Vijayalakshmi, P. ; Nagarajan, T.
Author_Institution :
SSN Coll. of Eng., Chennai, India
Abstract :
Speech recognition system developed with context-dependent phonemes captures the co-articulation effect and it gives a better performance compared to systems developed with context-independent units. However the performance of the system is also dependent on the speaker. Speaker dependence of the recognition system arises from the speaker-dependent speech features. The variation of the vocal tract length and! shape is the major cause for this inter-speaker variation. Thus the performance of speaker-independent (SI) systems is surpassed by speaker-dependent (SD) systems. It is well established in the literature that the recognition performance of the SI system can be improved to the standards of an SD system by speaker adaptation. The main focus in this paper revolves around the analysis on the amount and ratio of male and female training data for which the cross-gender speaker adaptation gives higher performance. The speaker adaptation cechniques MAP and MLLR are implemented, using the TIMIT speech corpus. It is observed that MLLR adapts the model parameters better than MAP even with 24s of adaptation data. It is also inferred that training the system with both male and female data results in better cross-gender adaptation performance, when compared with the system trained with a either male or female data, primarily because the system parameters differ greatly for male and female speakers. The overall recognition performance of the context-dependent system is improved by 0.55% for MAP adaptation and 2.75% for MLLR adaptation over the unadapted recognition system, for the minimal amount of data.
Keywords :
maximum likelihood estimation; regression analysis; speaker recognition; MAP adaptation; MLLR adaptation; SD system; SI system; coarticulation effect; context-dependent phonemes; cross-gender speaker adaptation analysis; female training data; inter-speaker variation; male training data; maximum a posteriori algorithm; maximum likelihood linear regression; speaker-dependent sgstems; speaker-dependent speech features; speaker-independent sνstems; speech recognition systems; vocal tract length; vocal tract shape; Adaptation models; Data models; Market research; Silicon; Speech; Speech recognition; Training; MAP; MLLR; Speaker adaptation;
Conference_Titel :
Recent Trends in Information Technology (ICRTIT), 2013 International Conference on
Conference_Location :
Chennai
DOI :
10.1109/ICRTIT.2013.6844235