DocumentCode :
3195093
Title :
Speaker Segmentation and Adaptation for Speech Recognition on Multiple-Speaker Audio Conference Data
Author :
Liu, Zhu ; Saraclar, Murat
Author_Institution :
AT&T Labs -Res., Middletown
fYear :
2007
fDate :
2-5 July 2007
Firstpage :
192
Lastpage :
195
Abstract :
In this paper, we address the problem of how to improve the automatic speech recognition (ASR) performance on audio conference data by speaker segmentation and speaker adaptation. A new speaker segmentation method is proposed, where the speaker turns and speaker labels are automatically determined. For speaker adaptation, we use Vocal Tract Length Normalization and Maximum Likelihood Linear Regression. On a corpus of multi-speaker teleconferences, the word error rate of the ASR system improves over 4% absolute.
Keywords :
maximum likelihood estimation; regression analysis; speaker recognition; automatic speech recognition performance; maximum likelihood linear regression; multiple-speaker audio conference data; speaker adaptation; speaker segmentation; speech recognition; vocal tract length normalization; Adaptation model; Automatic speech recognition; Iterative algorithms; Loudspeakers; Maximum likelihood linear regression; NIST; Speech enhancement; Speech recognition; Streaming media; Teleconferencing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Multimedia and Expo, 2007 IEEE International Conference on
Conference_Location :
Beijing
Print_ISBN :
1-4244-1016-9
Electronic_ISBN :
1-4244-1017-7
Type :
conf
DOI :
10.1109/ICME.2007.4284619
Filename :
4284619
Link To Document :
بازگشت