مرکز منطقه ای اطلاع رساني علوم و فناوري - Using prompts to produce quality corpus for training automatic speech recognition systems

DocumentCode :

2756107

Title :

Using prompts to produce quality corpus for training automatic speech recognition systems

Author :

Lecouteux, Benjamin ; Linarès, Georges

Author_Institution :

Lab. Inf. d´´Avignon (LIA), Univ. of Avignon, Avignon

fYear :

2008

fDate :

5-7 May 2008

Firstpage :

841

Lastpage :

846

Abstract :

In this paper we present an integrated unsupervised method to produce a quality corpus for training automatic speech recognition system (ASR) using prompts or closed captions. Closed captions and prompts do not always have timestamps and do not necessarily correspond to the exact speech. We propose a method allowing to extract quality corpus from imperfect transcript. The proposed approach works in two steps. During the search, the ASR system finds matching segments in a large prompt database. Matching segments are then used inside a driven decoding algorithm (DDA) to produce a high quality corpus. Results show a F-measure of 96% in term of spotting while the DDA corrects the output according to the prompts: a high quality corpus is easily extracted.

Keywords :

decoding; feature extraction; speech coding; speech recognition; unsupervised learning; automatic speech recognition systems; driven decoding algorithm; high quality corpus extraction; integrated unsupervised method; Abstracts; Automatic speech recognition; Costs; Databases; Decoding; Error analysis; Guidelines; Machine assisted indexing; Speech recognition; Transducers; automatic segmentation; closed captioning; corpus building; speech recognition;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Electrotechnical Conference, 2008. MELECON 2008. The 14th IEEE Mediterranean

Conference_Location :

Ajaccio

Print_ISBN :

978-1-4244-1632-5

Electronic_ISBN :

978-1-4244-1633-2

Type :

conf

DOI :

10.1109/MELCON.2008.4618540

Filename :

4618540

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2756107