• DocumentCode
    2008550
  • Title

    Text Classification Using Tree Kernels and Linguistic Information

  • Author

    Goncalves, Tiago ; Quaresma, Paulo

  • Author_Institution
    Dep. Inf., Univ. de Evora, Evora
  • fYear
    2008
  • fDate
    11-13 Dec. 2008
  • Firstpage
    763
  • Lastpage
    768
  • Abstract
    Standard Machine Learning approaches to text classification use the bag-of-words representation of documents to deceive the classification target function. Typical linguistic structures such as morphology, syntax and semantic are completely ignored in the learning process. This paper examines the role of these structures on the classifier construction applying the study to the Portuguese language. Classifiers are built using the SVM algorithm on a newspaper´s articles dataset. The results show that syntactic structure is not useful for text classification (as initially expected), but a novel structured representation that uses document´s semantic information has the same discriminative power over classes as the traditional bag-of-words one.
  • Keywords
    natural language processing; pattern classification; support vector machines; text analysis; Portuguese language; SVM algorithm; classification target function; documents bag-of-words representation; linguistic information; machine learning; text classification; tree kernels; Information representation; Information retrieval; Kernel; Machine learning; Machine learning algorithms; Morphology; Natural languages; Support vector machine classification; Support vector machines; Text categorization; SVM; linguistic information; text classification; tree kernels;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning and Applications, 2008. ICMLA '08. Seventh International Conference on
  • Conference_Location
    San Diego, CA
  • Print_ISBN
    978-0-7695-3495-4
  • Type

    conf

  • DOI
    10.1109/ICMLA.2008.78
  • Filename
    4725062