DocumentCode :
2702343
Title :
Segregation of Speakers for Speaker Adaptation in TV News Audio
Author :
Remes, Ulpu ; Pylkkonen, J. ; Kurimo, Mikko
Author_Institution :
Adaptive Inf. Res. Centre, Helsinki Univ. of Technol., Espoo, Finland
Volume :
4
fYear :
2007
fDate :
15-20 April 2007
Abstract :
Speaker adaptation is commonly used to compensate speaker variation in large vocabulary continuous speech recognition. In a multi-speaker environment where speakers change frequently speaker segregation is needed to divide the input audio stream to speaker turns. Speaker turns define the current speaker at each time and speaker adaptation can thus be done based on speaker turns. The novelty of this paper is that the speaker-specific transformations are estimated incrementally and in tandem with speaker segregation. Therefore we need a transformation that can be reliably estimated based on one speaker turn alone. We propose the constrained maximum likelihood linear regression (CMLLR) for this. In testing with Finnish TV news audio, speaker adaptation reduced the average letter error rate 25% relative to baseline.
Keywords :
maximum likelihood estimation; regression analysis; speaker recognition; speech processing; Finnish TV news audio; constrained maximum likelihood linear regression; large vocabulary continuous speech recognition; speaker adaptation; speaker segregation; Covariance matrix; Informatics; Maximum likelihood decoding; Maximum likelihood linear regression; Shape measurement; Speech recognition; Streaming media; TV; Testing; Vocabulary; speaker recognition; speech recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on
Conference_Location :
Honolulu, HI
ISSN :
1520-6149
Print_ISBN :
1-4244-0727-3
Type :
conf
DOI :
10.1109/ICASSP.2007.366954
Filename :
4218142
Link To Document :
بازگشت