• DocumentCode
    2353150
  • Title

    Chunker for Tamil

  • Author

    Dhanalakshmi, V. ; Padmavathy, P. ; Anand, Kumar M. ; Soman, K.P. ; Rajendran, S.

  • Author_Institution
    Comput. Eng. & Networking, Amrita Vishwa Vidyapeetham, Coimbatore, India
  • fYear
    2009
  • fDate
    27-28 Oct. 2009
  • Firstpage
    436
  • Lastpage
    438
  • Abstract
    This paper presents the chunker for Tamil using Machine learning techniques. Chunking is the task of identifying and segmenting the text into syntactically correlated word groups. The chunking is done by the machine learning techniques, where the linguistical knowledge is automatically extracted from the annotated corpus. We have developed our own tagset for annotating the corpus, which is used for training and testing the POS tagger generator and the chunker. The present tagset consists of thirty tags for POS and nine tags for chunking. A corpus size of two hundred and twenty five thousand words was used for training and testing the accuracy of the chunker. We found that CRF++ affords the most encouraging result for Tamil chunker.
  • Keywords
    learning (artificial intelligence); natural language processing; text analysis; CRF++; Tamil chunker; chunking; corpus annotation; linguistical knowledge; machine learning technique; natural language processing; syntactically correlated word group; text identification; text segmentation; Computer networks; Entropy; Guidelines; Hidden Markov models; Machine learning; Natural languages; Support vector machines; Tagging; Testing; Text recognition; Annotated corpus; Chunking; Machine learning techniques; Tagging;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Advances in Recent Technologies in Communication and Computing, 2009. ARTCom '09. International Conference on
  • Conference_Location
    Kottayam, Kerala
  • Print_ISBN
    978-1-4244-5104-3
  • Electronic_ISBN
    978-0-7695-3845-7
  • Type

    conf

  • DOI
    10.1109/ARTCom.2009.191
  • Filename
    5329357