DocumentCode :
2665403
Title :
Spiral construction of syntactically annotated spoken language corpus
Author :
Ohno, Tomohiro ; Matsubara, Shigeki ; Kawaguchi, Nobuo ; Inagaki, Yasuyoshi
Author_Institution :
Graduate Sch. of Inf. Sci., Nagoya Univ., Japan
fYear :
2003
fDate :
26-29 Oct. 2003
Firstpage :
477
Lastpage :
483
Abstract :
Spontaneous speech includes a broad range of linguistic phenomena characteristic of spoken language, and therefore a statistical approach would be effective for robust parsing of spoken language. Though a large-scale syntactically annotated corpus is required for the stochastic parsing, its construction requires a lot of human resources. We propose a method of efficiently constructing a spoken language corpus for which the dependency analysis is provided. This method uses an existing spoken language corpus. A stochastic dependency parse is employed to tag spoken language sentences with the dependency structures, and the results are corrected manually. The tagged corpus is constructed in a spiral fashion where in the corrected data is utilized as the statistical information for automatic parsing of other data. Taking this spiral approach reduces the parsing errors, also allowing us to reduce the correction cost. An experiment using 10995 Japanese utterances shows the spiral approach to be effective for efficient corpus construction.
Keywords :
computational linguistics; grammars; natural languages; speech processing; stochastic processes; language database; stochastic dependency parsing; syntactically annotated spoken language corpus construction; tagged corpus; Costs; Error correction; Information science; Information technology; Large-scale systems; Natural language processing; Natural languages; Robustness; Spirals; Stochastic processes;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Natural Language Processing and Knowledge Engineering, 2003. Proceedings. 2003 International Conference on
Conference_Location :
Beijing, China
Print_ISBN :
0-7803-7902-0
Type :
conf
DOI :
10.1109/NLPKE.2003.1275953
Filename :
1275953
Link To Document :
بازگشت