• DocumentCode
    1668537
  • Title

    Spatial and temporal alignment of multimodal human speech production data: Real time imaging, flesh point tracking and audio

  • Author

    Jangwon Kim ; Lammert, Adam ; Ghosh, Prosenjit ; Narayanan, Shrikanth S.

  • Author_Institution
    Univ. of Southern California, Los Angeles, CA, USA
  • fYear
    2013
  • Firstpage
    3637
  • Lastpage
    3641
  • Abstract
    In speech production research, the integration of articulatory data derived from multiple measurement modalities can provide rich description of vocal tract dynamics by overcoming the limited spatio-temporal representations offered by individual modalities. This paper presents a spatial and temporal alignment method between two promising modalities using a corpus of TIMIT sentences obtained from the same speaker: flesh point tracking from Electromagnetic Articulography (EMA) that offers high temporal resolution but sparse spatial information and real time Magnetic Resonance Imaging (MRI) that offers good spatial details but at lower temporal rates. Spatial alignment is done by using palate tracking of EMA, but distortion in MRI audio and articulatory data variability make temporal alignment challenging. This paper proposes a novel alignment technique using joint acoustic-articulatory features which combines dynamic time warping and automatic feature extraction from MRI images. Experimental results show that the temporal alignment obtained using this technique is better (12% relative) than that using acoustic feature only.
  • Keywords
    biomedical MRI; feature extraction; medical image processing; real-time systems; speech processing; EMA; MRI images; electromagnetic articulography; feature extraction; flesh point tracking; magnetic resonance imaging; multimodal human speech production data; real time imaging; spatial alignment; spatio-temporal representations; speech production research; temporal alignment; vocal tract dynamics; Magnetic resonance imaging; Mel frequency cepstral coefficient; Production; Sensors; Speech; Trajectory; EMA; MRI; Speech production; TIMIT corpus; automatic feature extraction; spatial alignment; temporal alignment;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on
  • Conference_Location
    Vancouver, BC
  • ISSN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2013.6638336
  • Filename
    6638336