DocumentCode :
1118102
Title :
Perceptual Long-Term Variable-Rate Sinusoidal Modeling of Speech
Author :
Girin, Laurent ; Firouzmand, Mohammad ; Marchand, Sylvain
Author_Institution :
Speech Commun. Lab., Nat. Polytech. Inst. of Grenoble
Volume :
15
Issue :
3
fYear :
2007
fDate :
3/1/2007 12:00:00 AM
Firstpage :
851
Lastpage :
861
Abstract :
In this paper, the problem of modeling the time-trajectory of the sinusoidal components of voiced speech signals is addressed. A new global approach is presented: a single so-called long-term (LT) model, based on discrete cosine functions, is used to model the overall trajectories of amplitude and phase parameters, for each entire voiced section of speech, differing from usual (short-term) models defined on a frame-by-frame basis. The complete analysis-modeling-synthesis process is presented, including an iterative algorithm for optimal fitting between LT model and measures. A major issue of this paper concerns the use of perceptual criteria in the LT model fitting process (both for amplitude and phase modeling). The adaptation of perceptual criteria usually defined in the short-term and/or stationary cases to the long-term processing is proposed. Experiments dealing with the ten first harmonics of voiced signals show that the proposed approach provides an efficient variable-rate representation of voiced speech signals. Promising results are given in terms of modeling accuracy, synthesis quality, and data compression. The interest of the presented approach for speech coding and speech watermarking is discussed
Keywords :
discrete cosine transforms; iterative methods; speech synthesis; amplitude modeling; analysis-modeling-synthesis process; data compression; discrete cosine functions; iterative algorithm; modeling accuracy; optimal fitting; perceptual criteria; perceptual long-term variable-rate sinusoidal speech modeling; phase modeling; speech coding; speech watermarking; synthesis quality; time-trajectory modeling; voiced speech signal; Algorithm design and analysis; Fourier transforms; Frequency; Interpolation; Iterative algorithms; Laboratories; Signal synthesis; Speech coding; Speech processing; Speech synthesis; Perceptual models; sinusoidal model; speech modeling; speech processing; variable rate;
fLanguage :
English
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
Publisher :
ieee
ISSN :
1558-7916
Type :
jour
DOI :
10.1109/TASL.2006.885928
Filename :
4100680
Link To Document :
بازگشت