Perceptual Long-Term Variable-Rate Sinusoidal Modeling of Speech

Author

Girin, Laurent ; Firouzmand, Mohammad ; Marchand, Sylvain

Author_Institution

Speech Commun. Lab., Nat. Polytech. Inst. of Grenoble

Volume

15

Issue

3

fYear

2007

fDate

3/1/2007 12:00:00 AM

Firstpage

851

Lastpage

861

Abstract

In this paper, the problem of modeling the time-trajectory of the sinusoidal components of voiced speech signals is addressed. A new global approach is presented: a single so-called long-term (LT) model, based on discrete cosine functions, is used to model the overall trajectories of amplitude and phase parameters, for each entire voiced section of speech, differing from usual (short-term) models defined on a frame-by-frame basis. The complete analysis-modeling-synthesis process is presented, including an iterative algorithm for optimal fitting between LT model and measures. A major issue of this paper concerns the use of perceptual criteria in the LT model fitting process (both for amplitude and phase modeling). The adaptation of perceptual criteria usually defined in the short-term and/or stationary cases to the long-term processing is proposed. Experiments dealing with the ten first harmonics of voiced signals show that the proposed approach provides an efficient variable-rate representation of voiced speech signals. Promising results are given in terms of modeling accuracy, synthesis quality, and data compression. The interest of the presented approach for speech coding and speech watermarking is discussed

Keywords

discrete cosine transforms; iterative methods; speech synthesis; amplitude modeling; analysis-modeling-synthesis process; data compression; discrete cosine functions; iterative algorithm; modeling accuracy; optimal fitting; perceptual criteria; perceptual long-term variable-rate sinusoidal speech modeling; phase modeling; speech coding; speech watermarking; synthesis quality; time-trajectory modeling; voiced speech signal; Algorithm design and analysis; Fourier transforms; Frequency; Interpolation; Iterative algorithms; Laboratories; Signal synthesis; Speech coding; Speech processing; Speech synthesis; Perceptual models; sinusoidal model; speech modeling; speech processing; variable rate;

fLanguage

English

Journal_Title

Audio, Speech, and Language Processing, IEEE Transactions on

Publisher

ieee

ISSN

1558-7916

Type

jour

DOI

10.1109/TASL.2006.885928

Filename

4100680