• DocumentCode
    294529
  • Title

    WSJCAMO: a British English speech corpus for large vocabulary continuous speech recognition

  • Author

    Robinson, Tony ; Fransen, Jeroen ; Pye, David ; Foote, Jonathan ; Renals, Steve

  • Author_Institution
    Dept. of Eng., Cambridge Univ., UK
  • Volume
    1
  • fYear
    1995
  • fDate
    9-12 May 1995
  • Firstpage
    81
  • Abstract
    A significant new speech corpus of British English has been recorded at Cambridge University. Derived from the Wall Street Journal text corpus, WSJCAMO constitutes one of the largest corpora of spoken British English currently in existence. It has been specifically designed for the construction and evaluation of speaker-independent speech recognition systems. The database consists of 140 speakers each speaking about 110 utterances. This paper describes the motivation for the corpus, the processes undertaken in its construction and the utilities needed as support tools. All utterance transcriptions have been verified and a phonetic dictionary has been developed to cover the training data and evaluation tasks. Two evaluation tasks have been defined using standard 5000 word bigram and 20000 word trigram language models. The paper concludes with comparative results on these tasks for British and American English
  • Keywords
    grammars; natural languages; speech processing; speech recognition; vocabulary; American English; British English speech corpus; Cambridge University; WSJCAMO; Wall Street Journal text corpus; database; evaluation tasks; large vocabulary continuous speech recognition; phonetic dictionary; speaker-independent speech recognition systems; spoken British English; support tools; training data; utterance transcriptions; word bigram language models; word trigram language models; Databases; Loudspeakers; Microphones; Natural languages; Preamplifiers; Robustness; Speech recognition; Testing; Training data; Vocabulary; Workstations;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech, and Signal Processing, 1995. ICASSP-95., 1995 International Conference on
  • Conference_Location
    Detroit, MI
  • ISSN
    1520-6149
  • Print_ISBN
    0-7803-2431-5
  • Type

    conf

  • DOI
    10.1109/ICASSP.1995.479278
  • Filename
    479278