• DocumentCode
    417252
  • Title

    Advances in the automatic transcription of lectures

  • Author

    Cettolo, Mauro ; Brugnara, Fabio ; Federico, Marcello

  • Author_Institution
    Centro per la Ricerca Scientifica e Tecnologica, ITC-irst, Povo Di Trento, Italy
  • Volume
    1
  • fYear
    2004
  • fDate
    17-21 May 2004
  • Abstract
    Transcribing lectures is a challenging task, both in acoustic and in language modeling. In this work, we present recent results on the automatic transcription of lectures from the Translanguage English Database, which contains the recordings of talks given in English at Eurospeech ´93, by mostly non-native speakers. Concerning acoustic modeling, the acoustic model trained for a broadcast news transcription task was adapted on the lectures training data through maximum likelihood linear regression adaptation, including models of spontaneous speech phenomena. Moreover, a normalization procedure was embodied in the training stage, consisting of a cluster-based mean and variance normalization of the static features. Language modeling was based on adaptation of a background language model estimated on broadcast news transcripts, conference proceedings, lecture transcripts, and conversational speech transcripts. Among the examined adaptation techniques, the most effective one was obtained by exploiting the paper presented in each lecture to be processed. The best transcription performance on a 2 hours test set was 32.4% word error rate.
  • Keywords
    maximum likelihood estimation; regression analysis; speech recognition; Eurospeech ´93; Translanguage English Database; acoustic language modeling; automatic lecture transcription; background language model; broadcast news transcription task; broadcast news transcripts; cluster-based mean; conference proceedings; conversational speech transcripts; lecture transcripts; maximum likelihood linear regression adaptation; spontaneous speech phenomena; training data; variance normalization; word error rate; Adaptation model; Broadcasting; Conference proceedings; Databases; Loudspeakers; Maximum likelihood linear regression; Natural languages; Speech; Testing; Training data;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on
  • ISSN
    1520-6149
  • Print_ISBN
    0-7803-8484-9
  • Type

    conf

  • DOI
    10.1109/ICASSP.2004.1326099
  • Filename
    1326099