Title of article :
Developing a Comprehensive Standard Persian Positional Tagset
Author/Authors :
Mahdavi ، Mohammad Amin
Pages :
26
From page :
165
To page :
190
Abstract :
One of the primary tools used in text processing tasks such as information retrieval, text extraction, and text mining, is a corpus that is enhnaced by linguistic tags. In a corpus development effort, the role of a POS-tagger is to assign a linguistic tag to every textual token. POS annotation relies heavily on a tagset based on a linguistic theory. Text processing in Persian, too, follows this common practice. Several tagsets have been introduced, so far, to annotate Persian corpora. However, each tagset has followed a specific standard and linguistic theory. The resulting tagsets contain a limited number of tags, which renders them inadequate for a larger scope of research. This study is inspired by EAGLES, MULTEXT-East, positional tagset standards to produce a comprehensive standard positional tagset for Persian. The proposed tagset is also informed by the existing Persian tagsets. The proposed Persian Positional Tagset (PPT) is designed to be used for morphological, lexical, and syntactic annotations of Persian corpora.
Keywords :
Persian Positional Tagset , Persian POS tagset , Standard Persian Tagset , Persian Morphosyntactic tagset. ,
Journal title :
Astroparticle Physics
Serial Year :
2018
Record number :
2435548
Link To Document :
بازگشت