DocumentCode
2265879
Title
Corpus annotation in inflectional languages: Czech
Author
Pala, Karel ; Rychly, Pavel ; Smrz, Pavel
Author_Institution
Fac. of Inf., Masaryk Univ., Brno, Czech Republic
fYear
1998
fDate
25-28 Aug 1998
Firstpage
149
Lastpage
153
Abstract
We offer basic information about Czech grammatically annotated and fully disambiguated corpus DESAM and its structure. The system and its method of tagging and disambiguation is briefly described as well. Further, we deal with the tagset used in the annotation of DESAM and explain the way in which the tagset is structured to cope with a highly inflectional language such as Czech. We mention the tools used for its management, particularly a corpus query processor CQP. The main attention is paid to the examination of the relations between the size of the DESAM tagset and measures of ambiguity observed for particular tags. Also the reliability of tagging with regard to the inventory of tags is explored. Some considerations based on statistical techniques of disambiguation are presented
Keywords
computational linguistics; grammars; natural languages; query processing; statistical analysis; CQP; Czech; DESAM; computational linguistics; corpus annotation; corpus query processor; disambiguation; grammatical annotation; inflectional languages; reliability; statistical techniques; tagging; Buildings; Computational linguistics; Dictionaries; Informatics; Information retrieval; Natural language processing; Particle measurements; Size measurement; Statistics; Tagging;
fLanguage
English
Publisher
ieee
Conference_Titel
Database and Expert Systems Applications, 1998. Proceedings. Ninth International Workshop on
Conference_Location
Vienna
Print_ISBN
0-8186-8353-8
Type
conf
DOI
10.1109/DEXA.1998.707395
Filename
707395
Link To Document