DocumentCode :
3600164
Title :
Dimensional reduction, covariance modeling, and computational complexity in ASR systems
Author :
Axelrod, Scott ; Gopinath, Rahul ; Olsen, Peder ; Visweswariah, K.
Author_Institution :
IBM T.J. Watson Res. Center, Yorktown Heights, NY, USA
Volume :
1
fYear :
2003
Abstract :
We study acoustic modeling for speech recognition using mixtures of exponential models with linear and quadratic features tied across all context dependent states. These models are one version of the SPAM models introduced by Axelrod, Gopinath and Olsen (see Proc. ICSLP, 2002). They generalize diagonal covariance, MLLT, EMLLT, and full covariance models. Reduction of the dimension of the acoustic vectors using LDA/HDA projections corresponds to a special case of reducing the exponential model feature space. We see, in one speech recognition task, that SPAM models on an LDA projected space of varying dimensions achieve a significant fraction of the WER improvement in going from MLLT to full covariance modeling, while maintaining the low computational cost of the MLLT models. Further, the feature precomputation cost can be minimized using the hybrid feature technique of Visweswariah, Olsen, Gopinath and Axelrod (see ICASSP 2003); and the number of Gaussians one needs to compute can be greatly reducing using hierarchical clustering of the Gaussians (with fixed feature space). Finally, we show that reducing the quadratic and linear feature spaces separately produces models with better accuracy, but comparable computational complexity, to LDA/HDA based models.
Keywords :
Gaussian processes; acoustic signal processing; computational complexity; covariance matrices; feature extraction; parameter estimation; speech recognition; ASR systems; EMLLT; Gaussians; LDA projected space; LDA/HDA based models; LDA/HDA projections; MLLT; SPAM models; WER; acoustic modeling; acoustic vectors; computational complexity; context dependent states; covariance modeling; diagonal covariance; dimensional reduction; exponential model feature space; exponential models; feature precomputation cost; feature space; full covariance model; hierarchical clustering; hybrid feature technique; linear feature spaces; linear features; low computational cost; models accuracy; parameter estimation; quadratic feature spaces; quadratic features; speech recognition; Automatic speech recognition; Computational complexity; Computational efficiency; Computational modeling; Context modeling; Gaussian processes; Linear discriminant analysis; Speech recognition; Unsolicited electronic mail; Vectors;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003 IEEE International Conference on
ISSN :
1520-6149
Print_ISBN :
0-7803-7663-3
Type :
conf
DOI :
10.1109/ICASSP.2003.1198918
Filename :
1198918
Link To Document :
بازگشت