مرکز منطقه ای اطلاع رساني علوم و فناوري - Automatic punctuation generation for speech

DocumentCode :

2973076

Title :

Automatic punctuation generation for speech

Author :

Shen, Wenzhu ; Yu, Roger Peng ; Seide, Frank ; Wu, Ji

Author_Institution :

Microsoft Res. Asia, Beijing, China

fYear :

2009

fDate :

Nov. 13 2009-Dec. 17 2009

Firstpage :

586

Lastpage :

589

Abstract :

Automatic generation of punctuation is an essential feature for many speech-to-text transcription tasks. This paper describes a maximum a-posteriori (MAP) approach for inserting punctuation marks into raw word sequences obtained from automatic speech recognition (ASR). The system consists of an Â¿acoustic modelÂ¿ (AM) for prosodic features (actually pause duration) and a Â¿language modelÂ¿ (LM) for text-only features. The LM combines three components: an MLP-based trigger-word model and a forward and a backward trigram punctuation predictor. The separation into acoustic and language model allows to learn these models on different corpora, especially allowing the LM to be trained on large amounts of data (text) for which no acoustic information is available. We find that the trigger-word LM is very useful, and further improvement can be achieved when combining both prosodic and lexical information. We achieve an F-measure of 81.0% and 56.5% for voicemails and podcasts, respectively, on reference transcripts, and 69.6% for voicemails on ASR transcripts.

Keywords :

maximum likelihood estimation; multilayer perceptrons; speech recognition; speech synthesis; MLP-based trigger-word model; acoustic model; automatic punctuation generation; automatic speech recognition; language model; maximum a-posteriori; multilayer perceptron; speech-to-text transcription; trigram punctuation predictor; Acoustical engineering; Asia; Automatic speech recognition; Delay; Information science; Laboratories; Maximum a posteriori estimation; Predictive models; Speech recognition; Voice mail;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Automatic Speech Recognition & Understanding, 2009. ASRU 2009. IEEE Workshop on

Conference_Location :

Merano

Print_ISBN :

978-1-4244-5478-5

Electronic_ISBN :

978-1-4244-5479-2

Type :

conf

DOI :

10.1109/ASRU.2009.5373365

Filename :

5373365

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2973076