DocumentCode
178390
Title
Multi-view learning with supervision for transformed bottleneck features
Author
Arora, Rajkumar ; Livescu, Karen
Author_Institution
TTI-Chicago, Chicago, IL, USA
fYear
2014
fDate
4-9 May 2014
Firstpage
2499
Lastpage
2503
Abstract
Previous work has shown that acoustic features can be improved by unsupervised learning of transformations based on canonical correlation analysis (CCA) using articulatory measurements that are available at training time. In this paper, we investigate whether this second view (articulatory data) still helps even when labels are also available at training time. We begin with strong baseline bottleneck features, which can be learned when the training set is phonetically labeled. We then compare several options for learning transformations of the bottleneck features in the presence of both articulatory measurements and phonetic labels for the training data. The methods compared include combinations of LDA and CCA, as well as a three-view extension of CCA that simultaneously uses the labels and articulatory measurements as additional views. Phonetic recognition experiments on data from the University of Wisconsin X-ray microbeam database show that the learned features improve performance over using either just the labels or just the articulatory measurements for learning acoustic transformations.
Keywords
acoustic signal processing; correlation theory; speech recognition; unsupervised learning; CCA; LDA; Wisconsin University; X-ray microbeam database; acoustic features; acoustic transformations; articulatory data; articulatory measurements; baseline bottleneck features; canonical correlation analysis; linear discriminant analysis; multiview learning; phonetic labels; phonetic recognition; supervision; three-view extension; training time; transformed bottleneck features; unsupervised learning; Acoustic measurements; Correlation; Mel frequency cepstral coefficient; Speech; Speech recognition; Training; articulatory measurements; bottleneck features; canonical correlation analysis; multi-view learning; supervised transformation learning;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
Conference_Location
Florence
Type
conf
DOI
10.1109/ICASSP.2014.6854050
Filename
6854050
Link To Document