• Title of article

    Designing a tagset for annotating the Tuvan National Corpus

  • Author/Authors

    Bayyr-ool، Aziyana نويسنده Institute of Philology , , Voinov، Vitaly نويسنده University of Texas at Arlington ,

  • Issue Information
    فصلنامه با شماره پیاپی سال 2012
  • Pages
    24
  • From page
    1
  • To page
    24
  • Abstract
    This paper examines various aspects of designing a part-of-speech (POS) tagset for annotating a textual corpus in the Tuvan language of Siberia (Turkic family). The issues raised are relevant by extension to designing tagsets in other languages. Preliminary issues discussed are Tuvan linguistic structure, the rationale for preferring a POS tagset at initial stages of corpus design, the metalanguage and orthography of the tagset, and the potential usefulness of existing tagsets for designing a new tagset. The paper then presents the specific linguistic attributes that are encoded in the Tuvan tagset, using the three-level model of Major Class > Subclass > Features. Difficulties involved in deciding whether a specific type of word is a major class or a subclass are illustrated with Tuvan language data. The actual structure of the individual tags to be used in the tagset is also discussed, examining several existing models that differ in terms of transparency and level of linguistic detail. Sample Tuvan words that have been tagged using the system laid out in the paper are provided to illustrate how this tagset design facilitates searching for decomposable morphosyntactic elements relevant to the grammatical structure of Tuvan (as well as that of other Turkic languages.)
  • Journal title
    International Journal of Language Studies
  • Serial Year
    2012
  • Journal title
    International Journal of Language Studies
  • Record number

    1163235