• DocumentCode
    2345213
  • Title

    Exploiting Negative Categories and Wikipedia Structures for Document Classification

  • Author

    Murugeshan, Meenakshi Sundaram ; Lakshmi, K. ; Mukherjee, Saswati

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Guindy Anna Univ., Chennai, India
  • fYear
    2009
  • fDate
    27-28 Oct. 2009
  • Firstpage
    868
  • Lastpage
    872
  • Abstract
    This paper explores the effect of profile based method for classification of Wikipedia XML documents. Our approach builds two profiles, exploiting the whole content, Initial Descriptions and links in the Wikipedia documents. For building profiles we use the negative category information which has shown to perform well for classifying unstructured texts. The performance of Cosine and Fractional Similarity metrics is also compared. The use of two classifiers and their weighted average improves the classification performance.
  • Keywords
    Web sites; XML; document handling; pattern classification; Wikipedia XML documents; Wikipedia structures; cosine metrics; document classification; fractional similarity metrics; initial descriptions; negative categories; negative category information; Communications technology; Computer science; Educational institutions; Paper technology; Radio frequency; Testing; Wikipedia; XML; Feature Selection; Multiple Classifiers; Negative Categories; Profile Creation; Similarity Measures; XML Classification;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Advances in Recent Technologies in Communication and Computing, 2009. ARTCom '09. International Conference on
  • Conference_Location
    Kottayam, Kerala
  • Print_ISBN
    978-1-4244-5104-3
  • Electronic_ISBN
    978-0-7695-3845-7
  • Type

    conf

  • DOI
    10.1109/ARTCom.2009.79
  • Filename
    5328383