Title :
Spiral construction of syntactically annotated spoken language corpus
Author :
Ohno, Tomohiro ; Matsubara, Shigeki ; Kawaguchi, Nobuo ; Inagaki, Yasuyoshi
Author_Institution :
Graduate Sch. of Inf. Sci., Nagoya Univ., Japan
Abstract :
Spontaneous speech includes a broad range of linguistic phenomena characteristic of spoken language, and therefore a statistical approach would be effective for robust parsing of spoken language. Though a large-scale syntactically annotated corpus is required for the stochastic parsing, its construction requires a lot of human resources. We propose a method of efficiently constructing a spoken language corpus for which the dependency analysis is provided. This method uses an existing spoken language corpus. A stochastic dependency parse is employed to tag spoken language sentences with the dependency structures, and the results are corrected manually. The tagged corpus is constructed in a spiral fashion where in the corrected data is utilized as the statistical information for automatic parsing of other data. Taking this spiral approach reduces the parsing errors, also allowing us to reduce the correction cost. An experiment using 10995 Japanese utterances shows the spiral approach to be effective for efficient corpus construction.
Keywords :
computational linguistics; grammars; natural languages; speech processing; stochastic processes; language database; stochastic dependency parsing; syntactically annotated spoken language corpus construction; tagged corpus; Costs; Error correction; Information science; Information technology; Large-scale systems; Natural language processing; Natural languages; Robustness; Spirals; Stochastic processes;
Conference_Titel :
Natural Language Processing and Knowledge Engineering, 2003. Proceedings. 2003 International Conference on
Conference_Location :
Beijing, China
Print_ISBN :
0-7803-7902-0
DOI :
10.1109/NLPKE.2003.1275953