Multilevel annotation of speech signals using weighted finite state transducers

Author

Paulo, Sérgio ; Oliveira, Luís

Author_Institution

Spoken Language Syst. Lab, INESC, Lisbon, Portugal

fYear

2002

fDate

11-13 Sept. 2002

Firstpage

111

Lastpage

114

Abstract

The purpose of this work was the development of a set of tools to automate the process of multilevel annotation of speech signals, preserving the alignments of the utterance´s different levels of the linguistic representation. Our goal is to build speech databases, using speech from non professional speakers with multilevel relational annotations, that can be used for the development of concatenative-based text-to-speech synthesizers or for training and testing statistical models. The method is based on the linguistic analysis of the transcription of the spoken material performed by a TTS system. The predicted phone sequence is then compared with the sequence produced by the speaker. The problem of aligning these two sequences is solved in a language-independent way using Weighted Finite State Transducers. After the alignment, a re-synchronization procedure is applied to the remaining levels to put them in agreement with the spoken utterance.

Keywords

linguistics; speech processing; speech synthesis; statistical analysis; TTS system; concatenative-based text-to-speech synthesizers; linguistic analysis; linguistic representation; multilevel relational annotations; predicted phone sequence; speech databases; speech signals; statistical model testing; statistical model training; weighted finite state transducers; Natural languages; Performance analysis; Relational databases; Signal processing; Speech processing; Speech recognition; Speech synthesis; Synthesizers; Testing; Transducers;

fLanguage

English

Publisher

ieee

Conference_Titel

Speech Synthesis, 2002. Proceedings of 2002 IEEE Workshop on

Print_ISBN

0-7803-7395-2

Type

conf

DOI

10.1109/WSS.2002.1224384

Filename

1224384