• DocumentCode
    3164500
  • Title

    Detecting a targeted voice style in an audiobook using voice quality features

  • Author

    Székely, Éva ; Kane, John ; Scherer, Stefan ; Gobl, Christer ; Carson-Berndsen, Julie

  • Author_Institution
    Sch. of Comput. Sci. & Inf., Univ. Coll. Dublin, Dublin, Ireland
  • fYear
    2012
  • fDate
    25-30 March 2012
  • Firstpage
    4593
  • Lastpage
    4596
  • Abstract
    Audiobooks are known to contain a variety of expressive speaking styles that occur as a result of the narrator mimicking a character in a story, or expressing affect. An accurate modeling of this variety is essential for the purposes of speech synthesis from an audiobook. Voice quality differences are important features characterizing these different speaking styles, which are realized on a gradient and are often difficult to predict from the text. The present study uses a parameter characterizing breathy to tense voice qualities using features of the wavelet transform, and a measure for identifying creaky segments in an utterance. Based on these features, a combination of supervised and unsupervised classification is used to detect the regions in an audiobook, where the speaker changes his regular voice quality to a particular voice style. The target voice style candidates are selected based on the agreement of the supervised classifier ensemble output, and evaluated in a listening test.
  • Keywords
    audio signal processing; pattern classification; speaker recognition; speech synthesis; unsupervised learning; wavelet transforms; audiobook; speaking style; speech synthesis; supervised classifier ensemble; targeted voice style detection; tense voice quality; text synthesis; unsupervised classification; voice quality feature; wavelet transform; Educational institutions; Feature extraction; Speech; Speech synthesis; Support vector machines; Training; Vibrations; audiobooks; classifier ensemble; expressive speech; fuzzy support vector machines; speech synthesis; voice quality;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on
  • Conference_Location
    Kyoto
  • ISSN
    1520-6149
  • Print_ISBN
    978-1-4673-0045-2
  • Electronic_ISBN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2012.6288941
  • Filename
    6288941