مرکز منطقه ای اطلاع رساني علوم و فناوري - Segmentation of Monologues in Audio Books for Building Synthetic Voices

DocumentCode :

1339104

Title :

Segmentation of Monologues in Audio Books for Building Synthetic Voices

Author :

Prahallad, Kishore ; Black, Alan W.

Author_Institution :

Int. Inst. of Inf. Technol., Hyderabad, India

Volume :

Issue :

fYear :

2011

fDate :

7/1/2011 12:00:00 AM

Firstpage :

1444

Lastpage :

1449

Abstract :

One of the issues in using audio books for building a synthetic voice is the segmentation of large speech files. The use of the Viterbi algorithm to obtain phone boundaries on large audio files fails primarily because of huge memory requirements. Earlier works have attempted to resolve this problem by using large vocabulary speech recognition system employing restricted dictionary and language model. In this paper, we propose suitable modifications to the Viterbi algorithm and demonstrate its usefulness for segmentation of large speech files in audio books. The utterances obtained from large speech files in audio books are used to build synthetic voices. We show that synthetic voices built from audio books in the public domain have Mel-cepstral distortion scores in the range of 4-7, which is similar to voices built from studio quality recordings such as CMU ARCTIC.

Keywords :

audio databases; audio recording; cepstral analysis; dictionaries; speech processing; speech recognition; speech synthesis; vocabulary; CMU ARCTIC; Mel-cepstral distortion scores; Viterbi algorithm; audio books; audio files; language model; memory requirements; monologue segmentation; phone boundary; public domain; restricted dictionary; speech files segmentation; studio quality recordings; synthetic voices; utterances; vocabulary speech recognition system; Books; Buildings; Databases; Feature extraction; Hidden Markov models; Speech; Viterbi algorithm; Audio books; forced-alignment; large speech files; text-to-speech (TTS);

fLanguage :

English

Journal_Title :

Audio, Speech, and Language Processing, IEEE Transactions on

Publisher :

ieee

ISSN :

1558-7916

Type :

jour

DOI :

10.1109/TASL.2010.2081980

Filename :

5590284

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1339104