Phrase-level transduction model with reordering for spoken to written language transformation

Author

Xu, Ping ; Fung, Pascale ; Chan, Ricky

Author_Institution

Dept. of Electron. & Comput. Eng., Hong Kong Univ. of Sci. & Technol., Hong Kong, China

fYear

2012

fDate

25-30 March 2012

Firstpage

4965

Lastpage

4968

Abstract

This paper proposes a first-ever phrase-level transduction model with reordering to transform colloquial speech directly to written-style transcription. This model is capable of performing n-m transductions. Our transduction model is trained from a parallel corpus of verbatim transcription and written-style transcription. Deletions, substitutions, insertions are well represented using this model. Inversion transduction cases can also be identified and represented. We implement our transduction model using weighted finite-state transducers (WFSTs), and integrate it into a WFST-based speech recognition search space to give both verbatim speaking-style and written-style transcriptions. Evaluations of our model on Cantonese speech to standard written Chinese show 11.59% relative Word Error Rate (WER) reduction over interpolated language model between Cantonese and standard Chinese speech, 5.72% relative WER reduction and 14.82% relative Bilingual Evaluation Understudy (BLEU) improvement over the word-level transduction model.

Keywords

natural language processing; speech recognition; BLEU; Cantonese speech; Chinese speech; WER reduction; WFST-based speech recognition search space; bilingual evaluation understudy; colloquial speech transform; first-ever phrase-level transduction model; inversion transduction; n-m transductions; verbatim speaking-style transcriptions; verbatim transcription parallel corpus; weighted finite-state transducers; word error rate reduction; word-level transduction model; written language transformation; written-style transcription; Computational modeling; Decoding; Hidden Markov models; Speech; Speech recognition; Standards; Transducers; WFST; phrase-level transduction; reordering; spoken to written language transformation;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on

Conference_Location

Kyoto

ISSN

1520-6149

Print_ISBN

978-1-4673-0045-2

Electronic_ISBN

1520-6149

Type

conf

DOI

10.1109/ICASSP.2012.6289034

Filename

6289034