• DocumentCode
    498433
  • Title

    Improving Documents Classification with Semantic Features

  • Author

    Rujiang, Bai ; Junhua, Liao

  • Author_Institution
    Shandong Univ. of Technol. Libr., Zibo, China
  • Volume
    1
  • fYear
    2009
  • fDate
    22-24 May 2009
  • Firstpage
    640
  • Lastpage
    643
  • Abstract
    Successful text classification is highly dependent on the representations used. Currently, most approaches to text classification adopt the `bag-of-words\´ document representation approach, where the frequency of occurrence of each word is considered as the most important feature, but this method ignores important semantic relationships between key terms. In this paper, we proposed a system that uses ontologies and Natural Language Processing techniques to index texts. Traditional BOW matrix is replaced by "Bag of Concepts" (BOC). For this purpose, we developed fully automated methods for mapping keywords to their corresponding ontology concepts. Support Vector Machine a successful machine learning technique is used for classification. Experimental results shows that our proposed method dose improve text classification performance significantly.
  • Keywords
    classification; learning (artificial intelligence); natural language processing; ontologies (artificial intelligence); support vector machines; text analysis; bag of concepts; bag-of-words; documents classification; machine learning; natural language processing techniques; ontologies; semantic features; support vector machine; Electronic mail; Frequency; Indexing; Libraries; Machine learning; Ontologies; Support vector machine classification; Support vector machines; Text categorization; Vocabulary; RDF; SVM; ontology; text classification;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Electronic Commerce and Security, 2009. ISECS '09. Second International Symposium on
  • Conference_Location
    Nanchang
  • Print_ISBN
    978-0-7695-3643-9
  • Type

    conf

  • DOI
    10.1109/ISECS.2009.231
  • Filename
    5209650