Title :
Speech recognition of broadcast news for the European Portuguese language
Author :
Meinedo, Hugo ; Souto, Nuno ; Net, Joao P.
Author_Institution :
L2F - Spoken Language Syst. Lab., INESC ID Lisboa/IST, Lisboa, Portugal
Abstract :
This paper describes our work on the development of a large vocabulary continuous speech recognition system applied to a broadcast news task for the European Portuguese language in the scope of the ALERT project. We start by presenting the baseline recogniser AUDIMUS, which was originally developed with a corpus of read newspaper text. This is a hybrid system that uses a combination of phone probabilities generated by several MLPs trained on distinct feature sets. The paper details the modifications introduced in this system, namely in the development of a new language model, the vocabulary and pronunciation lexicon and the training on new data from the ALERT BN corpus currently available. The system trained with this BN corpus achieved 18.4% WER when tested with the F0 focus condition (studio, planed, native, clean), and 35.2% when tested in all focus conditions.
Keywords :
hidden Markov models; linguistics; multilayer perceptrons; speech recognition; ALERT BN corpus; ALERT project; AUDIMUS; European Portuguese language; F0 focus condition; MLPs; broadcast news task; feature sets; hybrid system; language model; large vocabulary continuous speech recognition system; phone probabilities; pronunciation lexicon; vocabulary; Audio recording; Broadcasting; Databases; Multimedia systems; Natural languages; Speech recognition; Streaming media; System testing; TV; Vocabulary;
Conference_Titel :
Automatic Speech Recognition and Understanding, 2001. ASRU '01. IEEE Workshop on
Print_ISBN :
0-7803-7343-X
DOI :
10.1109/ASRU.2001.1034651