Title :
Unsupervised speaker adaptation based on hierarchical spectral clustering
Author_Institution :
NTT Human Interface Lab., Tokyo, Japan
fDate :
12/1/1989 12:00:00 AM
Abstract :
The author proposes an automatic speaker adaptation algorithm for speech recognition, in which a small amount of training material of unspecified text can be used. The algorithm is easily applied to vector-quantization- (VQ) speech recognition systems consisting of a VQ codebook and a word dictionary in which each word is represented as a sequence of codebook entries. In the adaptation algorithm, the VQ codebook is modified for each new speaker, whereas the word dictionary is universally used for all speakers. The important feature of this algorithm is that a set of spectra in training frames and the codebook entries are clustered hierarchically. Based on the vectors representing deviation between centroids of the training frame clusters and the corresponding codebook clusters, adaptation is performed hierarchically from small to large numbers of clusters. The spectral resolution of the adaptation process is improved accordingly. Results of recognition experiments using utterances of 100 Japanese city names show that adaptation reduces the mean word recognition error rate from 4.9 to 2.9%. Since the error rate for speaker-dependent recognition is 2.2%, the adaptation method is highly effective
Keywords :
spectral analysis; speech recognition; codebook; hierarchical spectral clustering; speaker adaptation; spectral resolution; speech recognition; training frames; training material; vector-quantization; word dictionary; Acoustics; Cities and towns; Clustering algorithms; Data mining; Dictionaries; Error analysis; Humans; Loudspeakers; Microphones; Speech recognition;
Journal_Title :
Acoustics, Speech and Signal Processing, IEEE Transactions on