LP and TD-PSOLA-based incorporation of happiness in neutral speech using time-domain parameters

Author

Sreenidhi, S. ; Rachel, G. Anushiya ; Vijayalakshmi, P. ; Nagarajan, T.

Author_Institution

SSN Coll. of Eng., Chennai, India

fYear

2014

Firstpage

1158

Lastpage

1162

Abstract

Emotions express a person´s internal state of being and it is reflected in the speech utterances. Emotions affect the time-domain characteristics of the speech signal, namely intonation patterns, speech rate, and short-term energy function. Conventional text-to-speech (TTS) systems are built to produce speech utterances for a given text, without any emotion, which can be called as neutral speech. Building a TTS system which can produce speech utterances with expected emotion is not a trivial task, in the sense that for each of the emotions, a separate speech corpus should be carefully collected and the system should be built. Therefore, the current work focuses on incorporating happiness into neutral speech using signal processing algorithms. In this regard, neutral and happy speech are analyzed and it is found that happiness can be perceived in certain emotive words in a sentence. Thus, in order to introduce happiness into neutral speech, these emotive keywords are identified and the above mentioned time-domain parameters are modified. Linear prediction-based synthesis of happy speech is initially performed. To improve the quality of the synthesized speech, TD-PSOLA is then used. Subjective evaluation yields a mean opinion score of 2.05 (out of a maximum of 3) for happy speech synthesized using linear prediction and 2.53 for those synthesized using TD-PSOLA.

Keywords

speech processing; speech synthesis; time-domain analysis; LP; TD-PSOLA; TTS system; emotive keywords; happy speech; intonation patterns; linear prediction-based synthesis; neutral speech; person internal state; short-term energy function; signal processing algorithms; speech corpus; speech rate; speech signal; speech utterances; text-to-speech systems; time-domain parameters; time-domain pitch synchronous overlap-add-based synthesis techniques; Computers; Polynomials; Spectrogram; Speech; Speech synthesis; Time-domain analysis; TD-PSOLA; happiness incorporation; linear prediction; neutral speech; pitch contour; short-term energy;

fLanguage

English

Publisher

ieee

Conference_Titel

Circuit, Power and Computing Technologies (ICCPCT), 2014 International Conference on

Print_ISBN

978-1-4799-2395-3

Type

conf

DOI

10.1109/ICCPCT.2014.7054931

Filename

7054931