Spatial and temporal alignment of multimodal human speech production data: Real time imaging, flesh point tracking and audio

Author

Jangwon Kim ; Lammert, Adam ; Ghosh, Prosenjit ; Narayanan, Shrikanth S.

Author_Institution

Univ. of Southern California, Los Angeles, CA, USA

fYear

2013

Firstpage

3637

Lastpage

3641

Abstract

In speech production research, the integration of articulatory data derived from multiple measurement modalities can provide rich description of vocal tract dynamics by overcoming the limited spatio-temporal representations offered by individual modalities. This paper presents a spatial and temporal alignment method between two promising modalities using a corpus of TIMIT sentences obtained from the same speaker: flesh point tracking from Electromagnetic Articulography (EMA) that offers high temporal resolution but sparse spatial information and real time Magnetic Resonance Imaging (MRI) that offers good spatial details but at lower temporal rates. Spatial alignment is done by using palate tracking of EMA, but distortion in MRI audio and articulatory data variability make temporal alignment challenging. This paper proposes a novel alignment technique using joint acoustic-articulatory features which combines dynamic time warping and automatic feature extraction from MRI images. Experimental results show that the temporal alignment obtained using this technique is better (12% relative) than that using acoustic feature only.

Keywords

biomedical MRI; feature extraction; medical image processing; real-time systems; speech processing; EMA; MRI images; electromagnetic articulography; feature extraction; flesh point tracking; magnetic resonance imaging; multimodal human speech production data; real time imaging; spatial alignment; spatio-temporal representations; speech production research; temporal alignment; vocal tract dynamics; Magnetic resonance imaging; Mel frequency cepstral coefficient; Production; Sensors; Speech; Trajectory; EMA; MRI; Speech production; TIMIT corpus; automatic feature extraction; spatial alignment; temporal alignment;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on

Conference_Location

Vancouver, BC

ISSN

1520-6149

Type

conf

DOI

10.1109/ICASSP.2013.6638336

Filename

6638336