The metamorphic algorithm: a speaker mapping approach to data augmentation

Author

Bellegarda, Jerome R. ; De Souza, Peter V. ; Nádas, Arthur ; Nahamoo, David ; Picheny, Michael A. ; Bahl, Lalit R.

Author_Institution

IBM Thomas J. Watson Res. Center, Yorktown Heights, NY, USA

Volume

2

Issue

3

fYear

1994

fDate

7/1/1994 12:00:00 AM

Firstpage

413

Lastpage

420

Abstract

Large vocabulary speaker-dependent speech recognition systems adjust to the acoustic peculiarities of each new speaker based on some enrolment data provided by this speaker. As the amount of data required increases with the sophistication of the underlying acoustic models, the enrolment may get lengthy. To streamline it, it is therefore desirable to make use of previously acquired speech data. The authors describe a data augmentation strategy based on a piecewise linear mapping between the feature space of a new speaker and that of a reference speaker. This speaker-normalizing mapping is used to transform the previously acquired data of the reference speaker onto the space of the new speaker. The performance of the resulting procedure, dubbed the metamorphic algorithm, is illustrated on an isolated utterance speech recognition task with a vocabulary of 20000 words. Results show that the metamorphic algorithm can substantially reduce the word error rate when only a limited amount of enrolment data is available. Alternatively, it leads to a level of performance comparable to that obtained when a much greater amount of enrolment data is required from the new speaker. In addition, it can also be used for tracking spectral evolution over time, thus providing a possible means for robust speaker self-adaptation

Keywords

hidden Markov models; piecewise-linear techniques; speech recognition; HMM; acoustic models; data augmentation; enrolment data; feature space; isolated utterance speech recognition; large vocabulary systems; metamorphic algorithm; performance; piecewise linear mapping; reference speaker; robust speaker self-adaptation; speaker mapping; speaker-dependent speech recognition; speaker-normalizing mapping; spectral evolution tracking; speech data; word error rate; Error analysis; Hidden Markov models; Loudspeakers; Natural languages; Parameter estimation; Piecewise linear techniques; Prototypes; Robustness; Speech recognition; Vocabulary;

fLanguage

English

Journal_Title

Speech and Audio Processing, IEEE Transactions on

Publisher

ieee

ISSN

1063-6676

Type

jour

DOI

10.1109/89.294355

Filename

294355