• DocumentCode
    670567
  • Title

    A new text representation scheme combining Bag-of-Words and Bag-of-Concepts approaches for automatic text classification

  • Author

    Alahmadi, Ahmed ; Joorabchi, Arash ; Mahdi, Abdulhussain E.

  • Author_Institution
    Electron. & Comput. Eng. Dept., Univ. of Limerick, Limerick, Ireland
  • fYear
    2013
  • fDate
    17-20 Nov. 2013
  • Firstpage
    108
  • Lastpage
    113
  • Abstract
    This paper introduces a new approach to creating text representations and apply it to a standard text classification collections. The approach is based on supplementing the well-known Bag-of-Words (BOW) representational scheme with a concept-based representation that utilises Wikipedia as a knowledge base. The proposed representations are used to generate a Vector Space Model, which in turn is fed into a Support Vector Machine classifier to categorise a collection of textual documents from two publically available datasets. Experimental results for evaluating the performance of our model in comparison to using a standard BOW scheme and a concept-based scheme, as well as recently reported similar text representations that are based on augmenting the standard BOW approach with concept-based representations.
  • Keywords
    pattern classification; support vector machines; text analysis; Wikipedia; automatic text classification; bag-of-concepts method; bag-of-words method; knowledge base; support vector machine classifier; text representation; textual document; vector space model; Electronic publishing; Encyclopedias; Internet; Support vector machine classification; Text categorization; Vectors; Bag-of-Concepts; Bag-of-Words; Text Classification; Wikipedia;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    GCC Conference and Exhibition (GCC), 2013 7th IEEE
  • Conference_Location
    Doha
  • Print_ISBN
    978-1-4799-0722-9
  • Type

    conf

  • DOI
    10.1109/IEEEGCC.2013.6705759
  • Filename
    6705759