• DocumentCode
    2768941
  • Title

    Dynamic language modeling for a daily broadcast news transcription system

  • Author

    Martins, Ciro ; Teixeira, António ; Neto, João

  • Author_Institution
    Aveiro Univ., Aveiro
  • fYear
    2007
  • fDate
    9-13 Dec. 2007
  • Firstpage
    165
  • Lastpage
    170
  • Abstract
    When transcribing Broadcast News data in highly inflected languages, the vocabulary growth leads to high out-of-vocabulary rates. To address this problem, we propose a daily and unsupervised adaptation approach which dynamically adapts the active vocabulary and LM to the topic of the current news segment during a multi-pass speech recognition process. Based on texts daily available on the Web, a story-based vocabulary is selected using a morpho-syntatic technique. Using an Information Retrieval engine, relevant documents are extracted from a large corpus to generate a story-based LM. Experiments were carried out for a European Portuguese BN transcription system. Preliminary results yield a relative reduction of 65.2% in OOV and 6.6% in WER.
  • Keywords
    information retrieval; natural language interfaces; speech recognition; broadcast news data; daily adaptation approach; daily broadcast news transcription system; dynamic language modeling; information retrieval engine; morpho-syntatic technique; multi-pass speech recognition process; unsupervised adaptation approach; Automatic speech recognition; Broadcasting; Data mining; Engines; Information retrieval; Natural languages; Speech recognition; Training data; Vocabulary; World Wide Web; Natural language interfaces; Speech recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Automatic Speech Recognition & Understanding, 2007. ASRU. IEEE Workshop on
  • Conference_Location
    Kyoto
  • Print_ISBN
    978-1-4244-1746-9
  • Electronic_ISBN
    978-1-4244-1746-9
  • Type

    conf

  • DOI
    10.1109/ASRU.2007.4430103
  • Filename
    4430103