مرکز منطقه ای اطلاع رساني علوم و فناوري - Towards High-Quality Next-Generation Text-to-Speech Synthesis: A Multidomain Approach by Automatic Domain Classification

DocumentCode :

819918

Title :

Towards High-Quality Next-Generation Text-to-Speech Synthesis: A Multidomain Approach by Automatic Domain Classification

Author :

Alías, Francesc ; Sevillano, Xavier ; Socoró, Joan Claudi ; Gonzalvo, Xavier

Author_Institution :

Grup de Recerca en Processament Multimodal, Univ. Ramon Llull, Barcelona

Volume :

Issue :

fYear :

2008

Firstpage :

1340

Lastpage :

1354

Abstract :

This paper is a contribution to the recent advancements in the development of high-quality next generation text-to-speech (TTS) synthesis systems. Two of the hottest research topics in this area are oriented towards the improvement of speech expressiveness and flexibility of synthesis. In this context, this paper presents a new TTS strategy called multidomain TTS (MD-TTS) for synthesizing among different domains. Although the multidomain philosophy has been widely applied in spoken language systems, few research efforts have been conducted to extend it to the TTS field. To do so, several proposals are described in this paper. First, a text classifier (TC) is included in the classic TTS architecture in order to automatically conduct the selection of the most appropriate domain for synthesizing the input text. In contrast to classic topic text classification tasks, the MD-TTS TC should not only consider the contents of text but also its structure. To this end, this paper introduces a new text modeling scheme based on an associative relational network, which represents texts as a directional weighted word-based graph. The conducted experiments validate the proposal in terms of both objective (TC efficiency) and subjective (perceived synthetic speech quality) evaluation criteria.

Keywords :

graph theory; speech processing; speech synthesis; text analysis; word processing; associative relational network; automatic domain classification; directional weighted word-based graph; multidomain approach; perceived synthetic speech quality; text classification tasks; text classifier; text-to-speech synthesis; Speech synthesis; text processing;

fLanguage :

English

Journal_Title :

Audio, Speech, and Language Processing, IEEE Transactions on

Publisher :

ieee

ISSN :

1558-7916

Type :

jour

DOI :

10.1109/TASL.2008.925145

Filename :

4581657

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=819918