• DocumentCode
    3725738
  • Title

    Classification of children stories in hindi using keywords and POS density

  • Author

    D M Harikrishna;K. Sreenivasa Rao

  • Author_Institution
    Indian Institute of Technology Kharagpur, India
  • fYear
    2015
  • Firstpage
    1
  • Lastpage
    5
  • Abstract
    The main objective of this work is to classify Hindi stories into three genres: fable, folk-tale and legend. In this paper, we are proposing a framework for story classification using keyword and Part-of-speech (POS) based features. Keyword based features like Term Frequency (TF) and Term Frequency Inverse Document Frequency (TFIDF) are used. Effect of POS tags like Noun, Pronoun, Adjective etc., are analyzed for different story genres. Classification performance is analyzed using different combinations of features with three classifiers; Naive Bayes (NB), k-Nearest Neighbour (KNN) and Support Vector Machine (SVM). From the experimental studies, it is observed that combining linguistic and keyword based features do not improve significantly the classifier performance. Among the classifiers, SVM models outperformed the other models.
  • Keywords
    "Support vector machines","Niobium","Pragmatics","Conferences","Computers","Text categorization","Tagging"
  • Publisher
    ieee
  • Conference_Titel
    Computer, Communication and Control (IC4), 2015 International Conference on
  • Type

    conf

  • DOI
    10.1109/IC4.2015.7375666
  • Filename
    7375666