DocumentCode
417263
Title
Studies in massively speaker-specific speech recognition
Author
Shi, Yu ; Chang, Eric
Volume
1
fYear
2004
fDate
17-21 May 2004
Abstract
Over the past several years, the primary focus for the speech-recognition research community has been speaker-independent speech recognition, with the emphasis of working on databases with larger and larger numbers of speakers. For example, the most recent EARS program, which is sponsored by DARPA, calls for recordings of thousands of speakers. However, we are interested in making a speech interface work well for one particular individual, and we propose using massive amounts of speaker-specific training data recorded in daily life. We call this massively speaker-specific recognition (MSSR). As a pre-research, we leverage the large corpus we have available from speech-synthesis work to study the benefit of MSSR only from the acoustic-modeling aspect. Initial results show that, by changing the focus to MSSR, word error rates can drop very significantly. In comparison with speaker-adaptive speech recognition systems, MSSR also performs better since model parameters can be tuned to be suitable to one particular individual.
Keywords
error statistics; learning (artificial intelligence); natural language interfaces; speech recognition; speech-based user interfaces; massively speaker-specific recognition; massively speaker-specific speech recognition; speaker-adaptive speech recognition; speaker-independent speech recognition; speech interface; speech synthesis; training data; word error rates; Asia; Databases; Ear; Error analysis; Maximum likelihood linear regression; Mobile handsets; Software systems; Speech recognition; Training data;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on
ISSN
1520-6149
Print_ISBN
0-7803-8484-9
Type
conf
DOI
10.1109/ICASSP.2004.1326113
Filename
1326113
Link To Document