Title :
Spatial and temporal alignment of multimodal human speech production data: Real time imaging, flesh point tracking and audio
Author :
Jangwon Kim ; Lammert, Adam ; Ghosh, Prosenjit ; Narayanan, Shrikanth S.
Author_Institution :
Univ. of Southern California, Los Angeles, CA, USA
Abstract :
In speech production research, the integration of articulatory data derived from multiple measurement modalities can provide rich description of vocal tract dynamics by overcoming the limited spatio-temporal representations offered by individual modalities. This paper presents a spatial and temporal alignment method between two promising modalities using a corpus of TIMIT sentences obtained from the same speaker: flesh point tracking from Electromagnetic Articulography (EMA) that offers high temporal resolution but sparse spatial information and real time Magnetic Resonance Imaging (MRI) that offers good spatial details but at lower temporal rates. Spatial alignment is done by using palate tracking of EMA, but distortion in MRI audio and articulatory data variability make temporal alignment challenging. This paper proposes a novel alignment technique using joint acoustic-articulatory features which combines dynamic time warping and automatic feature extraction from MRI images. Experimental results show that the temporal alignment obtained using this technique is better (12% relative) than that using acoustic feature only.
Keywords :
biomedical MRI; feature extraction; medical image processing; real-time systems; speech processing; EMA; MRI images; electromagnetic articulography; feature extraction; flesh point tracking; magnetic resonance imaging; multimodal human speech production data; real time imaging; spatial alignment; spatio-temporal representations; speech production research; temporal alignment; vocal tract dynamics; Magnetic resonance imaging; Mel frequency cepstral coefficient; Production; Sensors; Speech; Trajectory; EMA; MRI; Speech production; TIMIT corpus; automatic feature extraction; spatial alignment; temporal alignment;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on
Conference_Location :
Vancouver, BC
DOI :
10.1109/ICASSP.2013.6638336