• DocumentCode
    2045391
  • Title

    Efficient Feature Selection and Domain Relevance Term Weighting Method for Document Classification

  • Author

    Khan, Aurangzeb ; Baharudin, Baharum ; Khan, Khairullah

  • Author_Institution
    Dept. of Comput. & Inf. Sci., Univ. Teknol. PETRONAS, Tronoh, Malaysia
  • Volume
    2
  • fYear
    2010
  • fDate
    19-21 March 2010
  • Firstpage
    398
  • Lastpage
    403
  • Abstract
    Feature selection is of paramount concern in document classification process which improves the efficiency and accuracy of text classifier. Vector Space Model is used to represent the ¿Bag of Word¿ BOW of the documents with term weighting phenomena. Documents representing through this model has some limitations that is, ignoring term dependencies, structure and ordering of the terms in documents. To overcome this problem semantic base feature vector is proposed. That is used to extracts the concept of term, co-occurring and associated terms using ontology. The proposed method is applied on small documents dataset, which shows that this method outperforms then term frequency/ inverse document frequency (TF-IDF) with BOW feature selection method for text classification.
  • Keywords
    feature extraction; ontologies (artificial intelligence); pattern classification; text analysis; bag of word; document classification process; domain relevance term weighting method; feature selection; ontology; semantic base feature vector; term frequency inverse document frequency; text classifier; vector space model; Application software; Computer applications; Data mining; Extraterrestrial phenomena; Frequency; Machine learning; Ontologies; Organizing; Text categorization; Web sites; Feature selection; Feature vector; Ontology; Text classification;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Engineering and Applications (ICCEA), 2010 Second International Conference on
  • Conference_Location
    Bali Island
  • Print_ISBN
    978-1-4244-6079-3
  • Electronic_ISBN
    978-1-4244-6080-9
  • Type

    conf

  • DOI
    10.1109/ICCEA.2010.228
  • Filename
    5445679