• DocumentCode
    980813
  • Title

    Reliable methods for estimating relative vocal tract lengths from formant trajectories of common words

  • Author

    Watanabe, Akira ; Sakata, Tadashi

  • Author_Institution
    Kumamoto Prefectural Coll. of Technol., Kumamoto Prefecture
  • Volume
    14
  • Issue
    4
  • fYear
    2006
  • fDate
    7/1/2006 12:00:00 AM
  • Firstpage
    1193
  • Lastpage
    1204
  • Abstract
    This paper describes reliable methods for estimating relative vocal tract lengths from speech signals. Two proposed methods are based on the simple principle that resonant frequencies in an acoustic tube are inversely proportional to the tube length in cases where the configuration is constant. We estimated the ratio between two speakers\´ vocal tract lengths using first and second formant trajectories of the same words uttered by them. In the first method, which is referred to as "strict estimation method", we sought instances at which the gross structures of two vocal tracts are analogous by applying dynamic time-warping to formant-trajectories of common words that were uttered at different speeds. In those instances, which were found from among more than 100 common words by two speakers, an average formant ratio proved to be an excellent estimate (about plusmn0.1% in errors) for a reciprocal of the vocal tract length ratio. Next, we examined a simplified method for estimating those ratios using all corresponding points of two formant-trajectories: it is the "direct estimation method". Estimation errors in the direct estimation were evaluated to be about plusmn0.3% at equal utterance-speeds and plusmn2% at most, within 2.0 of the ratios of "fast" to "slow". Finally, we estimated relative vocal tract lengths for four Japanese speaker groups, whose members differed in terms of age and gender. Experimental results showed that the average vocal tract length of adult females and that of 7-10-year-old boys and girls are 21%, 27%, and 30%, respectively, shorter than adult males\´
  • Keywords
    physiology; reliability; speaker recognition; speech; speech processing; Japanese speaker; acoustic tube; direct estimation method; dynamic time-warping; formant trajectories; relative vocal tract lengths estimation; resonant frequencies; speech signals; strict estimation method; Cities and towns; Computer science; Estimation error; Frequency estimation; Magnetic resonance imaging; Resonant frequency; Speech enhancement; Speech processing; Speech recognition; Speech synthesis; Acoustic tube; dynamic time warping; formant frequency; vocal tract length;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1558-7916
  • Type

    jour

  • DOI
    10.1109/TSA.2005.860829
  • Filename
    1643648