• DocumentCode
    730738
  • Title

    Improving multiple-crowd-sourced transcriptions using a speech recogniser

  • Author

    van Dalen, R.C. ; Knill, K.M. ; Tsiakoulis, P. ; Gales, M.J.F.

  • Author_Institution
    Dept. of Eng. Trumpington Street, Univ. of Cambridge, Cambridge, UK
  • fYear
    2015
  • fDate
    19-24 April 2015
  • Firstpage
    4709
  • Lastpage
    4713
  • Abstract
    This paper introduces a method to produce high-quality transcriptions of speech data from only two crowd-sourced transcriptions. These transcriptions, produced cheaply by people on the Internet, for example through Amazon Mechanical Turk, are often of low quality. Often, multiple crowd-sourced transcriptions are combined to form one transcription of higher quality. However, the state of the art is to use essentially a form of majority voting, which requires at least three transcriptions for each utterance. This paper shows how to refine this approach to work with only two transcriptions. It then introduces a method that uses a speech recogniser (bootstrapped on a simple combination scheme) to combine transcriptions. When only two crowd-sourced transcriptions are available, on a noisy data set this improves the word error rate to gold-standard transcriptions by 21% relative.
  • Keywords
    Internet; bootstrapping; speech recognition; Amazon Mechanical Turk; Internet; bootstrapping; gold-standard transcriptions; high-quality transcriptions; majority voting; multiple crowd sourced transcriptions; simple combination scheme; speech data; speech recogniser; word error rate; Acoustics; Error analysis; Hidden Markov models; Speech; Speech recognition; Standards; Training; Automatic speech recognition; crowd-sourcing; transcription combination;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on
  • Conference_Location
    South Brisbane, QLD
  • Type

    conf

  • DOI
    10.1109/ICASSP.2015.7178864
  • Filename
    7178864