DocumentCode :
1688660
Title :
Multi-view CCA-based acoustic features for phonetic recognition across speakers and domains
Author :
Arora, Rajkumar ; Livescu, Karen
Author_Institution :
Toyota Technol. Inst. at Chicago (TTIC), Chicago, IL, USA
fYear :
2013
Firstpage :
7135
Lastpage :
7139
Abstract :
Canonical correlation analysis (CCA) and kernel CCA can be used for unsupervised learning of acoustic features when a second view (e.g., articulatory measurements) is available for some training data, and such projections have been used to improve phonetic frame classification. Here we study the behavior of CCA-based acoustic features on the task of phonetic recognition, and investigate to what extent they are speaker-independent or domain-independent. The acoustic features are learned using data drawn from the University of Wisconsin X-ray Microbeam Database (XRMB). The features are evaluated within and across speakers on XRMB data, as well as on out-of-domain TIMIT and MOCHA-TIMIT data. Experimental results show consistent improvement with the learned acoustic features over baseline MFCCs and PCA projections. In both speaker-dependent and cross-speaker experiments, phonetic error rates are improved by 4-9% absolute (10-23% relative) using CCA-based features over baseline MFCCs. In cross-domain phonetic recognition (training on XRMB and testing on MOCHA or TIMIT), the learned projections provide smaller improvements.
Keywords :
cepstral analysis; correlation methods; speaker recognition; unsupervised learning; MOCHA-TIMIT data; Mel frequency cepstral coefficients; PCA projections; University of Wisconsin; X-ray Microbeam Database; XRMB data; articulatory measurements; baseline MFCC; canonical correlation analysis; domain-independent; kernel CCA; multiview CCA-based acoustic features; out-of-domain TIMIT; phonetic error rates; phonetic frame classification; phonetic recognition; principal components analysis; speaker-independent; unsupervised learning; Acoustic measurements; Acoustics; Hidden Markov models; Kernel; Speech; Speech recognition; Training; MOCHA-TIMIT; TIMIT; XRMB; articulatory measurements; canonical correlation analysis; domain-independence; multi-view learning; speaker-independence;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on
Conference_Location :
Vancouver, BC
ISSN :
1520-6149
Type :
conf
DOI :
10.1109/ICASSP.2013.6639047
Filename :
6639047
Link To Document :
بازگشت