• DocumentCode
    642356
  • Title

    NLTK tagger for Albanian using iterative approach

  • Author

    Kadriu, A.

  • Author_Institution
    South East Eur. Univ., Tetove, Macedonia
  • fYear
    2013
  • fDate
    24-27 June 2013
  • Firstpage
    283
  • Lastpage
    288
  • Abstract
    This paper presents a research done about a model of tagging for Albanian texts, using the NLTK toolkit. The model uses cascading of three taggers with backoff. We use a dictionary of around 32000 words, together their correspondent POS tags and a set of regular expressions rules too. A lemmatize module is implemented in order to convert nouns and verbs to their lemma. The text is tagged initially with a unigram tagger based on the dictionary. This is used as a baseline tagger for a regular expressions tagger. A correction is made for not correct lemmatized words, creating a third lookup tagger. This tagger will be used with the first and second tagger as backoff.
  • Keywords
    dictionaries; iterative methods; natural language processing; text analysis; Albanian language; Albanian text; NLTK tagger; NLTK toolkit; POS tags; dictionary; iterative approach; lemmatize module; lemmatized words; lookup tagger; nouns; regular expressions rules; regular expressions tagger; taggers cascading; tagging model; text tagging; unigram tagger; verbs; Accuracy; Dictionaries; Economics; Hidden Markov models; Mood; Tagging; Training; Albanian language; NLTK; POS tagging;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Technology Interfaces (ITI), Proceedings of the ITI 2013 35th International Conference on
  • Conference_Location
    Cavtat
  • ISSN
    1334-2762
  • Print_ISBN
    978-953-7138-30-1
  • Type

    conf

  • DOI
    10.2498/iti.2013.0565
  • Filename
    6649039