Title :
Robust speech recognition with on-line unsupervised acoustic feature compensation
Author :
Buera, Luis ; Miguel, Antonio ; Lleida, Eduardo ; Saz, Óscar ; Ortega, Alfonso
Author_Institution :
Zaragoza Univ., Zaragoza
Abstract :
An on-line unsupervised hybrid compensation technique is proposed to reduce the mismatch between training and testing conditions. It combines multi-environment model based linear normalization with cross-probability model based on GMMs (MEMLIN CPM) with a novel acoustic model adaptation method based on rotation transformations. Hence, a set of rotation transformations is estimated with clean and MEMLIN CPM-normalized training data by linear regression in an unsupervised process. Thus, in testing, each MEMLIN CPM normalized frame is decoded using a modified Viterbi algorithm and expanded acoustic models, which are obtained from the reference ones and the set of rotation transformations. To test the proposed solution, some experiments with Spanish SpeechDat Car database were carried out. MEMLIN CPM over standard ETSI front-end parameters reaches 83.89% of average improvement in WER, while the introduced hybrid solution goes up to 92.07%. Also, the proposed hybrid technique was tested with Aurora 2 database, obtaining an average improvement of 68.88% with clean training.
Keywords :
Gaussian processes; audio acoustics; compensation; decoding; estimation theory; feature extraction; matrix algebra; probability; regression analysis; speech coding; speech recognition; unsupervised learning; vectors; GMM; MEMLIN CPM-normalized training data; Viterbi algorithm; acoustic model adaptation method; cross-probability model; feature vector normalization; linear regression; multienvironment model linear normalization; normalized frame decoding; online unsupervised acoustic feature compensation; online unsupervised hybrid compensation technique; rotation matrix estimation process; rotation transformations; speech recognition; testing conditions; training conditions; Acoustic testing; Adaptation model; Databases; Decoding; Linear regression; Robustness; Speech recognition; Telecommunication standards; Training data; Viterbi algorithm; acoustic model adaptation; feature vector normalization; robust speech recognition;
Conference_Titel :
Automatic Speech Recognition & Understanding, 2007. ASRU. IEEE Workshop on
Conference_Location :
Kyoto
Print_ISBN :
978-1-4244-1746-9
Electronic_ISBN :
978-1-4244-1746-9
DOI :
10.1109/ASRU.2007.4430092