• DocumentCode
    109029
  • Title

    Joint Source-Filter Optimization for Accurate Vocal Tract Estimation Using Differential Evolution

  • Author

    Schleusing, O. ; Kinnunen, Tomi ; Story, Brad ; Vesin, Jean-Marc

  • Author_Institution
    Dept. of Syst. Eng., CSEM, Neuchatel, Switzerland
  • Volume
    21
  • Issue
    8
  • fYear
    2013
  • fDate
    Aug. 2013
  • Firstpage
    1560
  • Lastpage
    1572
  • Abstract
    In this work, we present a joint source-filter optimization approach for separating voiced speech into vocal tract (VT) and voice source components. The presented method is pitch-synchronous and thereby exhibits a high robustness against vocal jitter, shimmer and other glottal variations while covering various voice qualities. The voice source is modeled using the Liljencrants-Fant (LF) model, which is integrated into a time-varying auto-regressive speech production model with exogenous input (ARX). The non-convex optimization problem of finding the optimal model parameters is addressed by a heuristic, evolutionary optimization method called differential evolution. The optimization method is first validated in a series of experiments with synthetic speech. Estimated glottal source and VT parameters are the criteria used for comparison with the iterative adaptive inverse filter (IAIF) method and the linear prediction (LP) method under varying conditions such as jitter, fundamental frequency (f0) as well as environmental and glottal noise. The results show that the proposed method largely reduces the bias and standard deviation of estimated VT coefficients and glottal source parameters. Furthermore, the performance of the source-filter separation is evaluated in experiments using speech generated with a physical model of speech production. The proposed method reliably estimates glottal flow waveforms and lower formant frequencies. Results obtained for higher formant frequencies indicate that research on more accurate voice source models and their interaction with the VT is necessary to improve the source-filter separation. The proposed optimization approach promises to be a useful tool for future research addressing this topic.
  • Keywords
    autoregressive processes; concave programming; evolutionary computation; parameter estimation; source separation; speech synthesis; ARX; LF model; Liljencrants-Fant model; VT parameter estimation; bias reduction; differential evolution; environmental noise; evolutionary optimization method; exogenous input; glottal noise; glottal source parameter estimation; glottal variation; heuristic; joint source-filter optimization; nonconvex optimization problem; optimal model parameter; pitch-synchronous method; shimmer; source-filter separation; speech generation; standard deviation; synthetic speech; time-varying auto-regressive speech production model; vocal jitter; vocal tract estimation; voice source component; voiced speech separation; Global optimization; differential evolution; glottal inverse filtering; joint source-filter optimization; time-varying vocal tract estimation;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1558-7916
  • Type

    jour

  • DOI
    10.1109/TASL.2013.2255275
  • Filename
    6488745