DocumentCode :
2756107
Title :
Using prompts to produce quality corpus for training automatic speech recognition systems
Author :
Lecouteux, Benjamin ; Linarès, Georges
Author_Institution :
Lab. Inf. d´´Avignon (LIA), Univ. of Avignon, Avignon
fYear :
2008
fDate :
5-7 May 2008
Firstpage :
841
Lastpage :
846
Abstract :
In this paper we present an integrated unsupervised method to produce a quality corpus for training automatic speech recognition system (ASR) using prompts or closed captions. Closed captions and prompts do not always have timestamps and do not necessarily correspond to the exact speech. We propose a method allowing to extract quality corpus from imperfect transcript. The proposed approach works in two steps. During the search, the ASR system finds matching segments in a large prompt database. Matching segments are then used inside a driven decoding algorithm (DDA) to produce a high quality corpus. Results show a F-measure of 96% in term of spotting while the DDA corrects the output according to the prompts: a high quality corpus is easily extracted.
Keywords :
decoding; feature extraction; speech coding; speech recognition; unsupervised learning; automatic speech recognition systems; driven decoding algorithm; high quality corpus extraction; integrated unsupervised method; Abstracts; Automatic speech recognition; Costs; Databases; Decoding; Error analysis; Guidelines; Machine assisted indexing; Speech recognition; Transducers; automatic segmentation; closed captioning; corpus building; speech recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Electrotechnical Conference, 2008. MELECON 2008. The 14th IEEE Mediterranean
Conference_Location :
Ajaccio
Print_ISBN :
978-1-4244-1632-5
Electronic_ISBN :
978-1-4244-1633-2
Type :
conf
DOI :
10.1109/MELCON.2008.4618540
Filename :
4618540
Link To Document :
بازگشت