Title :
Corpus annotation in inflectional languages: Czech
Author :
Pala, Karel ; Rychly, Pavel ; Smrz, Pavel
Author_Institution :
Fac. of Inf., Masaryk Univ., Brno, Czech Republic
Abstract :
We offer basic information about Czech grammatically annotated and fully disambiguated corpus DESAM and its structure. The system and its method of tagging and disambiguation is briefly described as well. Further, we deal with the tagset used in the annotation of DESAM and explain the way in which the tagset is structured to cope with a highly inflectional language such as Czech. We mention the tools used for its management, particularly a corpus query processor CQP. The main attention is paid to the examination of the relations between the size of the DESAM tagset and measures of ambiguity observed for particular tags. Also the reliability of tagging with regard to the inventory of tags is explored. Some considerations based on statistical techniques of disambiguation are presented
Keywords :
computational linguistics; grammars; natural languages; query processing; statistical analysis; CQP; Czech; DESAM; computational linguistics; corpus annotation; corpus query processor; disambiguation; grammatical annotation; inflectional languages; reliability; statistical techniques; tagging; Buildings; Computational linguistics; Dictionaries; Informatics; Information retrieval; Natural language processing; Particle measurements; Size measurement; Statistics; Tagging;
Conference_Titel :
Database and Expert Systems Applications, 1998. Proceedings. Ninth International Workshop on
Conference_Location :
Vienna
Print_ISBN :
0-8186-8353-8
DOI :
10.1109/DEXA.1998.707395