DocumentCode
1668537
Title
Spatial and temporal alignment of multimodal human speech production data: Real time imaging, flesh point tracking and audio
Author
Jangwon Kim ; Lammert, Adam ; Ghosh, Prosenjit ; Narayanan, Shrikanth S.
Author_Institution
Univ. of Southern California, Los Angeles, CA, USA
fYear
2013
Firstpage
3637
Lastpage
3641
Abstract
In speech production research, the integration of articulatory data derived from multiple measurement modalities can provide rich description of vocal tract dynamics by overcoming the limited spatio-temporal representations offered by individual modalities. This paper presents a spatial and temporal alignment method between two promising modalities using a corpus of TIMIT sentences obtained from the same speaker: flesh point tracking from Electromagnetic Articulography (EMA) that offers high temporal resolution but sparse spatial information and real time Magnetic Resonance Imaging (MRI) that offers good spatial details but at lower temporal rates. Spatial alignment is done by using palate tracking of EMA, but distortion in MRI audio and articulatory data variability make temporal alignment challenging. This paper proposes a novel alignment technique using joint acoustic-articulatory features which combines dynamic time warping and automatic feature extraction from MRI images. Experimental results show that the temporal alignment obtained using this technique is better (12% relative) than that using acoustic feature only.
Keywords
biomedical MRI; feature extraction; medical image processing; real-time systems; speech processing; EMA; MRI images; electromagnetic articulography; feature extraction; flesh point tracking; magnetic resonance imaging; multimodal human speech production data; real time imaging; spatial alignment; spatio-temporal representations; speech production research; temporal alignment; vocal tract dynamics; Magnetic resonance imaging; Mel frequency cepstral coefficient; Production; Sensors; Speech; Trajectory; EMA; MRI; Speech production; TIMIT corpus; automatic feature extraction; spatial alignment; temporal alignment;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on
Conference_Location
Vancouver, BC
ISSN
1520-6149
Type
conf
DOI
10.1109/ICASSP.2013.6638336
Filename
6638336
Link To Document