DocumentCode
2406550
Title
Linguistics-oriented language resource development at the National Institute for Japanese Language and Linguistics
Author
Maekawa, Kikuo
Author_Institution
Dept. Corpus Studies, Nat. Inst. for Japanese Language & Linguistics, Japan
fYear
2011
fDate
26-28 Oct. 2011
Firstpage
1
Lastpage
6
Abstract
The aim of this talk consists in the introduction to the language-resource-related activities of the National Institute for Japanese Language and Linguistics (NINJAL). Since the last half of the 1990s, the former National Language Research Institute (NLRI) played a central role in the development of Japanese language resources by constructing corpora like Corpus of Spontaneous Japanese (CSJ) and Taiyo Corpus. In 2006, the language resource group of NLRI started a Japanese corpus compilation initiative named KOTONOHA, and set about the construction of a 100 million words Balanced Corpus of Contemporary Written Japanese (BCCWJ). The activity of NLRI was inherited by the NINJAL Center for Corpus Development reestablished in 2009. Now that the construction of the BCCWJ was completed successfully in August 2011, the NINJAL center set about two new projects of exploratory nature: a historical corpus project and a 10-billion-word ultra-large-scale Web-based corpus project. In addition to the presentation of the NLRI-NINJAL activities, language resource development in Japanese institutions other than NINJAL will be introduced briefly in the beginning. Also, application of the CSJ to the study of phonetics will also be demonstrated at the end.
Keywords
Internet; linguistics; natural language processing; speech processing; 10-billion-word ultra-large-scale Web-based corpus project; BCCWJ; Balanced Corpus of Contemporary Written Japanese; CSJ; Corpus of Spontaneous Japanese; Japanese corpus compilation initiative; Japanese language resource development; KOTONOHA; NINJAL Center for Corpus Development; NLRI; National Institute for Japanese Language and Linguistics; National Language Research Institute; Taiyo Corpus; linguistics-oriented language resource development; Decision support systems; Helium; BCCWJ; Corpus; KOTONOHA; NINJAL;
fLanguage
English
Publisher
ieee
Conference_Titel
Speech Database and Assessments (Oriental COCOSDA), 2011 International Conference on
Conference_Location
Hsinchu
Print_ISBN
978-1-4577-0930-2
Type
conf
DOI
10.1109/ICSDA.2011.6085971
Filename
6085971
Link To Document