• DocumentCode
    3075713
  • Title

    Shallow Parsing for Hindi - An extensive analysis of sequential learning algorithms using a large annotated corpus

  • Author

    Gahlot, Himanshu ; Krishnarao, Awaghad Ashish ; Kushwaha, D.S.

  • Author_Institution
    Motilal Nehru Nat. Inst. of Technol., Allahabad
  • fYear
    2009
  • fDate
    6-7 March 2009
  • Firstpage
    1158
  • Lastpage
    1163
  • Abstract
    In this paper, we provide the first comprehensive comparison of methods for part-of-speech tagging and chunking for Hindi. We present an analysis of the application of three major learning algorithms (viz. Maximum entropy models [2] [9], Conditional random fields [12] and Support Vector Machines [8]) to part-of-speech tagging and chunking for Hindi Language using datasets of different sizes. The use of language independent features make this analysis more general and capable of concluding important results for similar South and South East Asian Languages. The results show that CRFs outperform SVMs and Maxent in terms of accuracy. We are able to achieve an accuracy of 92.26% for part-of-speech tagging and 93.57% for chunking using Conditional Random Fields algorithm. The corpus we have used had 138177 annotated instances for training. We report results for three learning algorithms by varying various conditions (clustering, BIEO notation vs. BIES notation, multiclass methods for SVMs etc.) and present an extensive analysis of the whole process. These results will give future researchers an insight into how to shape their research keeping in mind the comparative performance of major algorithms on datasets of various sizes and in various conditions.
  • Keywords
    grammars; learning (artificial intelligence); natural language processing; support vector machines; word processing; Hindi language; South Asian Languages; South East Asian Languages; conditional random fields algorithm; language independent features; large annotated corpus; part-of-speech chunking; part-of-speech tagging; sequential learning algorithms; shallow parsing; Algorithm design and analysis; Clustering algorithms; Entropy; Hidden Markov models; Machine learning; Natural languages; Speech; Stochastic processes; Support vector machines; Tagging;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Advance Computing Conference, 2009. IACC 2009. IEEE International
  • Conference_Location
    Patiala
  • Print_ISBN
    978-1-4244-2927-1
  • Electronic_ISBN
    978-1-4244-2928-8
  • Type

    conf

  • DOI
    10.1109/IADCC.2009.4809178
  • Filename
    4809178