• DocumentCode
    1118102
  • Title

    Perceptual Long-Term Variable-Rate Sinusoidal Modeling of Speech

  • Author

    Girin, Laurent ; Firouzmand, Mohammad ; Marchand, Sylvain

  • Author_Institution
    Speech Commun. Lab., Nat. Polytech. Inst. of Grenoble
  • Volume
    15
  • Issue
    3
  • fYear
    2007
  • fDate
    3/1/2007 12:00:00 AM
  • Firstpage
    851
  • Lastpage
    861
  • Abstract
    In this paper, the problem of modeling the time-trajectory of the sinusoidal components of voiced speech signals is addressed. A new global approach is presented: a single so-called long-term (LT) model, based on discrete cosine functions, is used to model the overall trajectories of amplitude and phase parameters, for each entire voiced section of speech, differing from usual (short-term) models defined on a frame-by-frame basis. The complete analysis-modeling-synthesis process is presented, including an iterative algorithm for optimal fitting between LT model and measures. A major issue of this paper concerns the use of perceptual criteria in the LT model fitting process (both for amplitude and phase modeling). The adaptation of perceptual criteria usually defined in the short-term and/or stationary cases to the long-term processing is proposed. Experiments dealing with the ten first harmonics of voiced signals show that the proposed approach provides an efficient variable-rate representation of voiced speech signals. Promising results are given in terms of modeling accuracy, synthesis quality, and data compression. The interest of the presented approach for speech coding and speech watermarking is discussed
  • Keywords
    discrete cosine transforms; iterative methods; speech synthesis; amplitude modeling; analysis-modeling-synthesis process; data compression; discrete cosine functions; iterative algorithm; modeling accuracy; optimal fitting; perceptual criteria; perceptual long-term variable-rate sinusoidal speech modeling; phase modeling; speech coding; speech watermarking; synthesis quality; time-trajectory modeling; voiced speech signal; Algorithm design and analysis; Fourier transforms; Frequency; Interpolation; Iterative algorithms; Laboratories; Signal synthesis; Speech coding; Speech processing; Speech synthesis; Perceptual models; sinusoidal model; speech modeling; speech processing; variable rate;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1558-7916
  • Type

    jour

  • DOI
    10.1109/TASL.2006.885928
  • Filename
    4100680