• DocumentCode
    1749767
  • Title

    Perceptual and objective detection of discontinuities in concatenative speech synthesis

  • Author

    Stylianou, Yannis ; Syrdal, Ann K.

  • Author_Institution
    Shannon Labs., AT&T Labs-Research, Florham Park, NJ, USA
  • Volume
    2
  • fYear
    2001
  • fDate
    2001
  • Firstpage
    837
  • Abstract
    Concatenative speech synthesis systems attempt to minimize audible signal discontinuities between two successive concatenated units. An objective distance measure which is able to predict audible discontinuities is therefore very important, particularly in unit selection synthesis, for which units are selected from among a large inventory at run time. In this paper, we describe a perceptual test to measure the detection rate of concatenation discontinuity by humans, and then we evaluate 13 different objective distance measures based on their ability to predict the human results. Criteria used to classify these distances include the detection rate, the Bhattacharyya measure of separability of two distributions, and receiver operating characteristic (ROC) curves. Results show that the Kullback-Leibler distance on power spectra has the higher detection rate followed by the Euclidean distance on Mel-frequency cepstral coefficients (MFCC)
  • Keywords
    cepstral analysis; spectral analysis; speech synthesis; Bhattacharyya measure of separability; Euclidean distance; Kullback-Leibler distance; Mel-frequency cepstral coefficients; ROC curves; audible discontinuities; concatenation discontinuity detection rate; concatenative speech synthesis; objective distance measure; perceptual test; power spectra; receiver operating characteristic curves; signal discontinuities; successive concatenated units; unit selection synthesis; Cepstral analysis; Concatenated codes; Euclidean distance; Humans; Mel frequency cepstral coefficient; Particle measurements; Signal synthesis; Speech synthesis; Testing; Time measurement;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech, and Signal Processing, 2001. Proceedings. (ICASSP '01). 2001 IEEE International Conference on
  • Conference_Location
    Salt Lake City, UT
  • ISSN
    1520-6149
  • Print_ISBN
    0-7803-7041-4
  • Type

    conf

  • DOI
    10.1109/ICASSP.2001.941045
  • Filename
    941045