• DocumentCode
    2709424
  • Title

    A Robust Discriminative Term Weighting Based Linear Discriminant Method for Text Classification

  • Author

    Junejo, Khurum Nazir ; Karim, Asim

  • Author_Institution
    Dept. of Comput. Sci., LUMS Sch. of Sci. & Eng., Lahore
  • fYear
    2008
  • fDate
    15-19 Dec. 2008
  • Firstpage
    323
  • Lastpage
    332
  • Abstract
    Text classification is widely used in applications ranging from e-mail filtering to review classification. Many of these applications demand that the classification method be efficient and robust, yet produce accurate categorizations by using the terms in the documents only. We present a supervised text classification method based on discriminative term weighting, discrimination information pooling, and linear discrimination. Terms in the documents are assigned weights according to the discrimination information they provide for one category over the others. These weights also serve to partition the terms into two sets. A linear opinion pool is adopted for combining the discrimination information provided by each set of terms yielding a two-dimensional feature space. Subsequently, a linear discriminant function is learned to categorize the documents in the feature space. We provide intuitive and empirical evidence of the robustness of our method with three term weighting strategies. Experimental results are presented for data sets from three different application areas. The results show that our method´s accuracy is higher than other popular methods, especially when there is a distribution shift from training to testing sets. Moreover, our method is simple yet robust to different application domains and small training set sizes.
  • Keywords
    classification; text analysis; discrimination information pooling; documents; linear opinion pool; robust discriminative term weighting based linear discriminant method; supervised text classification method; two-dimensional feature space; Application software; Computer science; Data engineering; Data mining; Electronic mail; Hybrid power systems; Information filtering; Robustness; Text categorization; Web pages; generative-discriminative algorithm; term weighting; text classification;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining, 2008. ICDM '08. Eighth IEEE International Conference on
  • Conference_Location
    Pisa
  • ISSN
    1550-4786
  • Print_ISBN
    978-0-7695-3502-9
  • Type

    conf

  • DOI
    10.1109/ICDM.2008.26
  • Filename
    4781127