• DocumentCode
    2265879
  • Title

    Corpus annotation in inflectional languages: Czech

  • Author

    Pala, Karel ; Rychly, Pavel ; Smrz, Pavel

  • Author_Institution
    Fac. of Inf., Masaryk Univ., Brno, Czech Republic
  • fYear
    1998
  • fDate
    25-28 Aug 1998
  • Firstpage
    149
  • Lastpage
    153
  • Abstract
    We offer basic information about Czech grammatically annotated and fully disambiguated corpus DESAM and its structure. The system and its method of tagging and disambiguation is briefly described as well. Further, we deal with the tagset used in the annotation of DESAM and explain the way in which the tagset is structured to cope with a highly inflectional language such as Czech. We mention the tools used for its management, particularly a corpus query processor CQP. The main attention is paid to the examination of the relations between the size of the DESAM tagset and measures of ambiguity observed for particular tags. Also the reliability of tagging with regard to the inventory of tags is explored. Some considerations based on statistical techniques of disambiguation are presented
  • Keywords
    computational linguistics; grammars; natural languages; query processing; statistical analysis; CQP; Czech; DESAM; computational linguistics; corpus annotation; corpus query processor; disambiguation; grammatical annotation; inflectional languages; reliability; statistical techniques; tagging; Buildings; Computational linguistics; Dictionaries; Informatics; Information retrieval; Natural language processing; Particle measurements; Size measurement; Statistics; Tagging;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Database and Expert Systems Applications, 1998. Proceedings. Ninth International Workshop on
  • Conference_Location
    Vienna
  • Print_ISBN
    0-8186-8353-8
  • Type

    conf

  • DOI
    10.1109/DEXA.1998.707395
  • Filename
    707395