Acoustic model training using feature vectors generated by manipulating speech parameters of real speakers

Author

Kawai, Takaaki ; Kitaoka, Norihide ; Takeda, Kenji

Author_Institution

Nagoya Univ., Nagoya, Japan

fYear

2012

fDate

3-6 Dec. 2012

Firstpage

1

Lastpage

5

Abstract

In this paper, we propose a robust speaker-independent acoustic model training method using generative training to generate many pseudo-speakers from a small number of real speakers. We focus on the difference between each speaker´s vocal tract length, and manipulate it in order to create many different pseudo-speakers with a range of vocal tract lengths. This method employs frequency warping based on the inverted use Vocal Tract Length Normalization(VTLN). Another method for creating pseudo-speakers is to vary the speaking rate of the speakers. This can be achieved by a method called PICOLA; Pointer Interval Controlled OverLap and Add. In experiments, we train acoustic models using these generated pseudo-speakers in addition to the original speakers. Evaluation results show that generating pseudo-speakers by manipulating speaking rates did not result in a sufficient increase in performance, however, vocal tract length warping was effective.

Keywords

learning (artificial intelligence); speech processing; PICOLA; Pointer Interval Controlled OverLap and Add; VTLN; feature vectors; generative training; pseudo-speaker generation; pseudo-speakers; speaker-independent acoustic model training method; speech parameter manipulation; vocal tract length normalization; Accuracy; Decoding; Filter banks; Robustness; Vectors;

fLanguage

English

Publisher

ieee

Conference_Titel

Signal & Information Processing Association Annual Summit and Conference (APSIPA ASC), 2012 Asia-Pacific

Conference_Location

Hollywood, CA

Print_ISBN

978-1-4673-4863-8

Type

conf

Filename

6411771