DocumentCode
1688660
Title
Multi-view CCA-based acoustic features for phonetic recognition across speakers and domains
Author
Arora, Rajkumar ; Livescu, Karen
Author_Institution
Toyota Technol. Inst. at Chicago (TTIC), Chicago, IL, USA
fYear
2013
Firstpage
7135
Lastpage
7139
Abstract
Canonical correlation analysis (CCA) and kernel CCA can be used for unsupervised learning of acoustic features when a second view (e.g., articulatory measurements) is available for some training data, and such projections have been used to improve phonetic frame classification. Here we study the behavior of CCA-based acoustic features on the task of phonetic recognition, and investigate to what extent they are speaker-independent or domain-independent. The acoustic features are learned using data drawn from the University of Wisconsin X-ray Microbeam Database (XRMB). The features are evaluated within and across speakers on XRMB data, as well as on out-of-domain TIMIT and MOCHA-TIMIT data. Experimental results show consistent improvement with the learned acoustic features over baseline MFCCs and PCA projections. In both speaker-dependent and cross-speaker experiments, phonetic error rates are improved by 4-9% absolute (10-23% relative) using CCA-based features over baseline MFCCs. In cross-domain phonetic recognition (training on XRMB and testing on MOCHA or TIMIT), the learned projections provide smaller improvements.
Keywords
cepstral analysis; correlation methods; speaker recognition; unsupervised learning; MOCHA-TIMIT data; Mel frequency cepstral coefficients; PCA projections; University of Wisconsin; X-ray Microbeam Database; XRMB data; articulatory measurements; baseline MFCC; canonical correlation analysis; domain-independent; kernel CCA; multiview CCA-based acoustic features; out-of-domain TIMIT; phonetic error rates; phonetic frame classification; phonetic recognition; principal components analysis; speaker-independent; unsupervised learning; Acoustic measurements; Acoustics; Hidden Markov models; Kernel; Speech; Speech recognition; Training; MOCHA-TIMIT; TIMIT; XRMB; articulatory measurements; canonical correlation analysis; domain-independence; multi-view learning; speaker-independence;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on
Conference_Location
Vancouver, BC
ISSN
1520-6149
Type
conf
DOI
10.1109/ICASSP.2013.6639047
Filename
6639047
Link To Document