Direct training of subspace distribution clustering hidden Markov model

Author

Mak, Brian Kan-Wing ; Bocchieri, Enrico

Author_Institution

Dept. of Comput. Sci., Hong Kong Univ. of Sci. & Technol., Clear Water Bay, China

Volume

9

Issue

4

fYear

2001

fDate

5/1/2001 12:00:00 AM

Firstpage

378

Lastpage

387

Abstract

It generally takes a long time and requires a large amount of speech data to train hidden Markov models for a speech recognition task of a reasonably large vocabulary. Previously, we proposed a compact acoustic model called “subspace distribution clustering hidden Markov model” (SDCHMM) with an aim to save some of the training effort. SDCHMMs are derived from tying continuous density hidden Markov models (CDHMMs) at a finer subphonetic level, namely the subspace distributions. Experiments on the Airline Travel Information System (ATIS) task show that SDCHMMs with significantly fewer model parameters-by one to two orders of magnitude-can be converted from CDHMMs with no loss in word accuracy. With such compact acoustic models, one should be able to train SDCHMM directly from significantly less speech data (without intermediate CDHMMs). We devise a direct SDCHMM training algorithm, assuming an a priori knowledge of the subspace distribution tying structure. On the ATIS task, it is found that both a context-independent and a context-dependent speaker-independent 20-stream SDCHMM system trained with 8 min of speech perform as well as their corresponding CDHMM system trained with 105 min and 36 h of speech, respectively

Keywords

hidden Markov models; speech recognition; ATIS; Airline Travel Information System; CDHMM; SDCHMM; compact acoustic model; context-dependent speaker-independent system; context-independent system; continuous density hidden Markov models; direct SDCHMM training algorithm; direct training; hidden Markov model; model parameters; speech data; speech recognition; subphonetic level; subspace distribution clustering HMM; subspace distributions; word accuracy; Associate members; Automatic speech recognition; Computer science; Hidden Markov models; Information systems; Laboratories; Parameter estimation; Speech recognition; Training data; Vocabulary;

fLanguage

English

Journal_Title

Speech and Audio Processing, IEEE Transactions on

Publisher

ieee

ISSN

1063-6676

Type

jour

DOI

10.1109/89.917683

Filename

917683