• DocumentCode
    60412
  • Title

    Glottal and Vocal Tract Characteristics of Voice Impersonators

  • Author

    Bin Amin, Talal ; Marziliano, Pina ; German, James Sneed

  • Author_Institution
    Sch. of Electr. & Electron. Eng., Nanyang Technol. Univ., Singapore, Singapore
  • Volume
    16
  • Issue
    3
  • fYear
    2014
  • fDate
    Apr-14
  • Firstpage
    668
  • Lastpage
    678
  • Abstract
    Voice impersonators possess a flexible voice which allows them to imitate and create different voice identities. These impersonations present a challenge for forensic analysis and speaker identification systems. To better understand the phenomena underlying successful voice impersonation, we collected a database of synchronous speech and ElectroGlottoGraphic (EGG) signals from three voice impersonators each producing nine distinct voice identities. We analyzed glottal and vocal tract measures including F0, speech rate, vowel formant frequencies, and timing characteristics of the vocal folds. Our analysis confirmed that the impersonators modulated all four parameters in producing the voices, and provides a lower bound on the scale of variability that is available to impersonators. Importantly, vowel formant differences across voices were highly dependent on vowel category, showing that such effects cannot be captured by global transformations that ignore the linguistic parse. We address this issue through the development of a no-reference objective metric based on the vowel-dependent variance of the formants associated with each voice. This metric both ranks the impersonators natural voices highly, and correlates strongly with the results of a subjective listening test. Together, these results demonstrate the utility of voice variability data for the development of voice disguise detection and speaker identification applications.
  • Keywords
    art; speech recognition; EGG signals; ElectroGlottoGraphic signals; forensic analysis; glottal tract characteristics; glottal tract measures; linguistic parse; no-reference objective metric; speaker identification applications; speaker identification systems; speech rate; subjective listening test; synchronous speech signals; timing characteristics; variability scale; vocal tract characteristics; vocal tract measures; voice disguise detection; voice identity; voice impersonation; voice impersonators; voice variability data; vowel category; vowel formant frequencies; vowel-dependent variance; Acoustics; Educational institutions; Forensics; Frequency measurement; Materials; Pragmatics; Speech; Acoustic; disguise; formant; glottal; open quotient; speech rate; vocal tract; voice identity; voice impersonator;
  • fLanguage
    English
  • Journal_Title
    Multimedia, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1520-9210
  • Type

    jour

  • DOI
    10.1109/TMM.2014.2300071
  • Filename
    6712131