• DocumentCode
    591909
  • Title

    Transcription of multi-genre media archives using out-of-domain data

  • Author

    Bell, Patrick J. ; Gales, Mark J.F. ; Lanchantin, Pierre ; Liu, Xindong ; Long, Yan ; Renals, Steve ; Swietojanski, Pawel ; Woodland, Philip C.

  • Author_Institution
    Centre for Speech Technol. Res., Univ. of Edinburgh, Edinburgh, UK
  • fYear
    2012
  • fDate
    2-5 Dec. 2012
  • Firstpage
    324
  • Lastpage
    329
  • Abstract
    We describe our work on developing a speech recognition system for multi-genre media archives. The high diversity of the data makes this a challenging recognition task, which may benefit from systems trained on a combination of in-domain and out-of-domain data. Working with tandem HMMs, we present Multi-level Adaptive Networks (MLAN), a novel technique for incorporating information from out-of-domain posterior features using deep neural networks. We show that it provides a substantial reduction in WER over other systems, with relative WER reductions of 15% over a PLP baseline, 9% over in-domain tandem features and 8% over the best out-of-domain tandem features.
  • Keywords
    hidden Markov models; information retrieval systems; records management; speech recognition; MLAN; deep neural networks; hidden Markov model; in-domain tandem features; multigenre media archives transcription; multilevel adaptive networks; out-of-domain posterior features; relative WER reductions; speech recognition system; tandem HMM; Acoustics; Adaptation models; Hidden Markov models; Neural networks; Speech; Training; Training data; cross-domain adaptation; media archives; speech recognition; tandem;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Spoken Language Technology Workshop (SLT), 2012 IEEE
  • Conference_Location
    Miami, FL
  • Print_ISBN
    978-1-4673-5125-6
  • Electronic_ISBN
    978-1-4673-5124-9
  • Type

    conf

  • DOI
    10.1109/SLT.2012.6424244
  • Filename
    6424244