• DocumentCode
    1884892
  • Title

    Semantic based features selection and weighting method for text classification

  • Author

    Khan, Aurangzeb ; Baharudin, Baharum ; Khan, Khairullah

  • Author_Institution
    Dept. of Comput. & Inf. Sci., Univ. Teknol. PETRONAS, Tronoh, Malaysia
  • Volume
    2
  • fYear
    2010
  • fDate
    15-17 June 2010
  • Firstpage
    850
  • Lastpage
    855
  • Abstract
    Feature selection and weighting is of vital concern in text classification process which improves the efficiency and accuracy of text classifier. Vector Space Model is used to represent the documents using “Bag of Word” BOW model with term weighting phenomena. Documents representation through this model has some limitations that are, ignoring term dependencies, structure and ordering of the terms in documents. To overcome this problem, Semantics Base Feature Vector using Part of Speech (POS), is proposed, which is used to extract the concept of terms using WordNet, co-occurring and associated terms. The proposed method is applied on small documents dataset which shows that this method outperforms then term frequency/ inverse document frequency (TF-IDF) with BOW feature selection method for text classification.
  • Keywords
    classification; text analysis; WordNet; bag of word feature selection; document dataset; document representation; part of speech; semantic based feature selection; semantics base feature vector; term frequency/ inverse document frequency; term weighting phenomena; text classification process; text classifier; vector space model; Argon; Electronic learning; Equations; Ontologies; Support vector machine classification; Thin film transistors; POS; feature selection; feature vector; text classification;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Technology (ITSim), 2010 International Symposium in
  • Conference_Location
    Kuala Lumpur
  • ISSN
    2155-897
  • Print_ISBN
    978-1-4244-6715-0
  • Type

    conf

  • DOI
    10.1109/ITSIM.2010.5561563
  • Filename
    5561563