• DocumentCode
    3166494
  • Title

    Phrase-level transduction model with reordering for spoken to written language transformation

  • Author

    Xu, Ping ; Fung, Pascale ; Chan, Ricky

  • Author_Institution
    Dept. of Electron. & Comput. Eng., Hong Kong Univ. of Sci. & Technol., Hong Kong, China
  • fYear
    2012
  • fDate
    25-30 March 2012
  • Firstpage
    4965
  • Lastpage
    4968
  • Abstract
    This paper proposes a first-ever phrase-level transduction model with reordering to transform colloquial speech directly to written-style transcription. This model is capable of performing n-m transductions. Our transduction model is trained from a parallel corpus of verbatim transcription and written-style transcription. Deletions, substitutions, insertions are well represented using this model. Inversion transduction cases can also be identified and represented. We implement our transduction model using weighted finite-state transducers (WFSTs), and integrate it into a WFST-based speech recognition search space to give both verbatim speaking-style and written-style transcriptions. Evaluations of our model on Cantonese speech to standard written Chinese show 11.59% relative Word Error Rate (WER) reduction over interpolated language model between Cantonese and standard Chinese speech, 5.72% relative WER reduction and 14.82% relative Bilingual Evaluation Understudy (BLEU) improvement over the word-level transduction model.
  • Keywords
    natural language processing; speech recognition; BLEU; Cantonese speech; Chinese speech; WER reduction; WFST-based speech recognition search space; bilingual evaluation understudy; colloquial speech transform; first-ever phrase-level transduction model; inversion transduction; n-m transductions; verbatim speaking-style transcriptions; verbatim transcription parallel corpus; weighted finite-state transducers; word error rate reduction; word-level transduction model; written language transformation; written-style transcription; Computational modeling; Decoding; Hidden Markov models; Speech; Speech recognition; Standards; Transducers; WFST; phrase-level transduction; reordering; spoken to written language transformation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on
  • Conference_Location
    Kyoto
  • ISSN
    1520-6149
  • Print_ISBN
    978-1-4673-0045-2
  • Electronic_ISBN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2012.6289034
  • Filename
    6289034