Soft-clustering technique for training data in Age-and gender-independent speech recognition

Author

Enami, D. ; Faqiang Zhu ; Yamamoto, Koji ; Nakagawa, Sachiko

Author_Institution

Dept. of Comput. Sci. & Eng., Toyohashi Univ. of Technol., Toyohashi, Japan

fYear

2012

fDate

3-6 Dec. 2012

Firstpage

1

Lastpage

4

Abstract

In this paper, we propose approaches for the Gaussian mixture model (GMM) based soft clustering of training data and the GMM- or/and hidden Markov model (HMM)-based cluster selection in age and gender-independent speech recognition. Typically, increasing the number of speaker classes leads to more specific models in speaker-class-dependent speech recognition, and thus better recognition performance. However, the amount of data for each class model is reduced by the increase in the number of classes, which leads to unreliable model parameters. To solve the problem of the reduction of training data, we propose a GMM-based soft clustering method that allows overlap, and a selecting method for selecting a speaker model using a GMM or/and HMM. In an experiment, we obtained a 5.0% absolute gain for word error rate (WER), and a 24.9% gain for the relative WER over an age- and gender-dependent baseline.

Keywords

Gaussian processes; hidden Markov models; learning (artificial intelligence); speech recognition; GMM; Gaussian mixture model; HMM-based cluster selection; WER; age-independent speech recognition; gender-independent speech recognition; hidden Markov model; soft clustering; soft-clustering technique; speaker model; speaker-class-dependent speech recognition; training data reduction; word error rate; Adaptation models; Context modeling; Educational institutions; Hidden Markov models; Lead; Training;

fLanguage

English

Publisher

ieee

Conference_Titel

Signal & Information Processing Association Annual Summit and Conference (APSIPA ASC), 2012 Asia-Pacific

Conference_Location

Hollywood, CA

Print_ISBN

978-1-4673-4863-8

Type

conf

Filename

6411780