DocumentCode
2768941
Title
Dynamic language modeling for a daily broadcast news transcription system
Author
Martins, Ciro ; Teixeira, António ; Neto, João
Author_Institution
Aveiro Univ., Aveiro
fYear
2007
fDate
9-13 Dec. 2007
Firstpage
165
Lastpage
170
Abstract
When transcribing Broadcast News data in highly inflected languages, the vocabulary growth leads to high out-of-vocabulary rates. To address this problem, we propose a daily and unsupervised adaptation approach which dynamically adapts the active vocabulary and LM to the topic of the current news segment during a multi-pass speech recognition process. Based on texts daily available on the Web, a story-based vocabulary is selected using a morpho-syntatic technique. Using an Information Retrieval engine, relevant documents are extracted from a large corpus to generate a story-based LM. Experiments were carried out for a European Portuguese BN transcription system. Preliminary results yield a relative reduction of 65.2% in OOV and 6.6% in WER.
Keywords
information retrieval; natural language interfaces; speech recognition; broadcast news data; daily adaptation approach; daily broadcast news transcription system; dynamic language modeling; information retrieval engine; morpho-syntatic technique; multi-pass speech recognition process; unsupervised adaptation approach; Automatic speech recognition; Broadcasting; Data mining; Engines; Information retrieval; Natural languages; Speech recognition; Training data; Vocabulary; World Wide Web; Natural language interfaces; Speech recognition;
fLanguage
English
Publisher
ieee
Conference_Titel
Automatic Speech Recognition & Understanding, 2007. ASRU. IEEE Workshop on
Conference_Location
Kyoto
Print_ISBN
978-1-4244-1746-9
Electronic_ISBN
978-1-4244-1746-9
Type
conf
DOI
10.1109/ASRU.2007.4430103
Filename
4430103
Link To Document