DocumentCode :
3630610
Title :
PDTSL: An annotated resource for speech reconstruction
Author :
Jan Hajic;Silvie Cinkova;Marie Mikulova;Petr Pajas;Jan Ptacek;Josef Toman;Zdenka Uresova
Author_Institution :
Institute of Formal and Applied Linguistics, Faculty of Mathematics and Physics, Charles University in Prague, Malostransk? n?m. 25, 11800 Prague 1, Czech Republic
fYear :
2008
Firstpage :
93
Lastpage :
96
Abstract :
We present a description of a new resource (Prague Dependency Treebank of Spoken Language) being created for English and Czech to be used for the task of speech understanding, broad natural language analysis for dialog systems and other speech-related tasks, including speech editing. The resources we have created so far contain audio and a standard transcription of spontaneous speech, but as a novel layer, we add an edited (ldquoreconstructedrdquo) version of the spoken utterances. These edits go beyond the scope of current speech reconstruction efforts in that we allow, on top of the usual deletions of speech artifacts, fillers, etc. also for word modifications, insertions and word order changes. We have used both monologue and dialogue recordings in English and Czech to verify the feasibility of such transcription. We have also assessed the quality of the resulting annotation since the relative freedom of the editing raises an issue of what a ldquocorrectrdquo annotation is.
Keywords :
"Speech analysis","Natural languages","Speech recognition","Automatic speech recognition","Text recognition","Guidelines","Labeling","Mathematics","Physics","Vocabulary"
Publisher :
ieee
Conference_Titel :
Spoken Language Technology Workshop, 2008. SLT 2008. IEEE
Print_ISBN :
978-1-4244-3471-8
Type :
conf
DOI :
10.1109/SLT.2008.4777848
Filename :
4777848
Link To Document :
بازگشت