Unsupervised and incremental speaker adaptation under adverse environmental conditions

Author

Takagi, Keizaburo ; Shinoda, Kazuma ; Hattori, Hiroaki ; Watanabe, Takao

Author_Institution

Inf. Technol. Res. Labs., NEC Corp., Kawasaki, Japan

Volume

4

fYear

1996

fDate

3-6 Oct 1996

Firstpage

2079

Abstract

A speaker adaptation method is described. In practical applications of speaker adaptation, adaptation and testing environments change significantly and are unknown beforehand. In such cases, since the speaker adaptation adapts a reference pattern to the adaptation utterances with regard to differences in both environment and speaker at the same time, performance in speaker adaptation would be degraded. To cope with this problem, our proposed method first eliminates the environmental differences between each input utterance and a reference pattern by using a rapid environment adaptation algorithm based on spectrum equalization (REALISE) (K. Takagi et al., 1995). Then we apply an unsupervised and incremental speaker adaptation with autonomous control using tree structure pdfs (ACTS) (K. Shinoda and T. Watanabe, 1995) to the environmentally adapted reference pattern. By combining these two methods, the resulting system is expected to perform well under adverse environmental conditions and to show a stable improvement, regardless of the amount of adaptation data. Evaluation experiments were carried out for utterances under three vehicle speed conditions. Recognition rates for a 100 Japanese word recognition task after 100 word adaptation were improved from 92% (ACTS alone) to 95% (proposed method)

Keywords

adaptive systems; natural languages; probability; speech processing; speech recognition; tree data structures; ACTS; Japanese word recognition task; REALISE; adaptation data; adaptation utterances; adverse environmental conditions; autonomous control; environmental differences; environmentally adapted reference pattern; incremental speaker adaptation; input utterance; rapid environment adaptation algorithm; reference pattern; speaker adaptation method; spectrum equalization; tree structure pdfs; unsupervised speaker adaptation; vehicle speed conditions; Additive noise; Degradation; Information technology; National electric code; Probability density function; Speech recognition; Testing; Tree data structures; Vehicles; Working environment noise;

fLanguage

English

Publisher

ieee

Conference_Titel

Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on

Conference_Location

Philadelphia, PA

Print_ISBN

0-7803-3555-4

Type

conf

DOI

10.1109/ICSLP.1996.607211

Filename

607211