DocumentCode
77456
Title
Dynamic 3-D Visualization of Vocal Tract Shaping During Speech
Author
Yinghua Zhu ; Yoon-Chul Kim ; Proctor, M.I. ; Narayanan, Shrikanth S. ; Nayak, Khrishna S.
Author_Institution
Dept. of Electr. Eng., Univ. of Southern California, Los Angeles, CA, USA
Volume
32
Issue
5
fYear
2013
fDate
May-13
Firstpage
838
Lastpage
848
Abstract
Noninvasive imaging is widely used in speech research as a means to investigate the shaping and dynamics of the vocal tract during speech production. 3-D dynamic MRI would be a major advance, as it would provide 3-D dynamic visualization of the entire vocal tract. We present a novel method for the creation of 3-D dynamic movies of vocal tract shaping based on the acquisition of 2-D dynamic data from parallel slices and temporal alignment of the image sequences using audio information. Multiple sagittal 2-D real-time movies with synchronized audio recordings are acquired for English vowel-consonant-vowel stimuli /ala/, /ara/, /asa/, and /a∫a/. Audio data are aligned using mel-frequency cepstral coefficients (MFCC) extracted from windowed intervals of the speech signal. Sagittal image sequences acquired from all slices are then aligned using dynamic time warping (DTW). The aligned image sequences enable dynamic 3-D visualization by creating synthesized movies of the moving airway in the coronal planes, visualizing desired tissue surfaces and tube-shaped vocal tract airway after manual segmentation of targeted articulators and smoothing. The resulting volumes allow for dynamic 3-D visualization of salient aspects of lingual articulation, including the formation of tongue grooves and sublingual cavities, with a temporal resolution of 78 ms.
Keywords
biomechanics; biomedical MRI; image sequences; medical image processing; speech; 2D dynamic data; 3D dynamic MRI; 3D dynamic movies; 3D dynamic visualization; DTW; English vowel-consonant-vowel stimuli; MFCC; audio information; dynamic time warping; image sequence temporal alignment; mel frequency cepstral coefficients; noninvasive imaging; parallel slices; sagittal 2D real time movies; speech production; tube shaped vocal tract airway; vocal tract shaping; Image reconstruction; Magnetic resonance imaging; Mel frequency cepstral coefficient; Real-time systems; Speech; Tongue; Articulation; dynamic time warping; real-time magnetic resonance imaging (MRI); retrospective gating; speech production; vocal tract shaping; Adult; Humans; Imaging, Three-Dimensional; Magnetic Resonance Imaging; Male; Signal Processing, Computer-Assisted; Speech Production Measurement; Vocal Cords;
fLanguage
English
Journal_Title
Medical Imaging, IEEE Transactions on
Publisher
ieee
ISSN
0278-0062
Type
jour
DOI
10.1109/TMI.2012.2230017
Filename
6362229
Link To Document