Title :
Approaches to gathering realistic training data for speech translation systems
Author :
Bretan, Ivan ; Eklund, Robert ; MacDermid, Catriona
Author_Institution :
Telia Res. AB, Haninge, Sweden
fDate :
30 Sep-1 Oct 1996
Abstract :
The Spoken Language Translator (SLT) is a multi-lingual speech-to-speech translation prototype supporting English, Swedish and French within the air traffic information system (ATIS) domain. The design of SLT is characterized by a strongly corpus-driven approach, which accentuates the need for cost-efficient collection procedures to obtain training data. This paper discusses various approaches to the data collection issue pursued within a speech translation framework. Original American English speech and language data have been collected using traditional Wizard-of-Oz (WOZ) techniques, a relatively costly procedure yielding high-quality results. The resulting corpus has been translated textually into Swedish by a large number of native speakers (427) and used as prompts for training the target language speech model. This “budget” collection method is compared to the accepted method, i.e., gathering data by means of a full-blown WOZ simulation. The results indicate that although translation in this case proved economical and produced considerable data, the method is not sensitive to certain features typical of spoken language, for which WOZ is superior
Keywords :
air traffic control; language translation; natural language interfaces; speech recognition; English; French; Spoken Language Translator; Swedish; Wizard-of-Oz techniques; air traffic information system; corpus-driven approach; cost-efficient collection procedures; data collection; multi-lingual speech-to-speech translation prototype; realistic training data gathering; speech translation systems; Costs; Frequency; Humans; Information systems; Large-scale systems; Natural languages; Prototypes; Speech recognition; Training data; Vocabulary;
Conference_Titel :
Interactive Voice Technology for Telecommunications Applications, 1996. Proceedings., Third IEEE Workshop on
Conference_Location :
Basking Ridge, NJ
Print_ISBN :
0-7803-3238-5
DOI :
10.1109/IVTTA.1996.552770