DocumentCode :
1109226
Title :
Frame-specific statistical features for speaker independent speech recognition
Author :
Bocchieri, Enrico L. ; Doddington, George R.
Author_Institution :
Texas Instruments, Dallas, TX, USA
Volume :
34
Issue :
4
fYear :
1986
fDate :
8/1/1986 12:00:00 AM
Firstpage :
755
Lastpage :
764
Abstract :
The performance of current speaker independent speech recognition technology is limited by the inadequacy of the measures of the speech data to discriminate between different speech sounds. In particular, two critical assumptions that underlie and limit most current recognition techniques are that: 1) speech data from different frames are statistically independent (e.g., there are no between-frame interactions); and 2) speech data statistics are independent of phonetic events (e.g., distance measures are fixed and independent of input or reference speech). In the context of speaker independent isolated digit recognition, improved recognition performance is demonstrated by: 1) explicitly modeling the correlation between spectral measurements of adjacent frames; and 2) using a distance measure which is a function of the recognition reference frame being used. A statistical model was created from a 2464 token database (2 tokens of each of 11 words "zero" through "nine" and "oh") for 112 speakers. Primary features include energy and filter bank amplitudes. Interspeaker variability was estimated by time aligning all training tokens and creating an ensemble of 224 feature vectors for each reference frame. Normal distributions were then estimated individually for each frame jointly with its neighbors. Testing was performed on a multidialect database of 2486 spoken digit tokens collected from 113 (different) speakers using maximum-likelihood decision methods. The substitution rate dropped from 1.7 to 1.4 percent with incorporation of between-frame statistics, and further to 0.6 percent with incorporation of frame-specific statistics in the likelihood model.
Keywords :
Context modeling; Current measurement; Filter bank; Gaussian distribution; Loudspeakers; Particle measurements; Spatial databases; Speech recognition; Statistics; Testing;
fLanguage :
English
Journal_Title :
Acoustics, Speech and Signal Processing, IEEE Transactions on
Publisher :
ieee
ISSN :
0096-3518
Type :
jour
DOI :
10.1109/TASSP.1986.1164911
Filename :
1164911
Link To Document :
بازگشت