• DocumentCode
    3213546
  • Title

    Cross-domain text classification using semantic based approach

  • Author

    Barathi, B.U.A.

  • Author_Institution
    Rajalakshmi Eng. Coll., Chennai, India
  • fYear
    2011
  • fDate
    20-22 July 2011
  • Firstpage
    820
  • Lastpage
    825
  • Abstract
    Internet is a huge repository of disparate information growing at an exponential rate. Efficient and effective document retrieval and classification systems are required to turn the massive amount of data into useful information, and eventually into knowledge. A traditional approach to document classification requires labelled data in order to construct reliable and accurate classifiers. A co-clustering based classification algorithm has been previously proposed to tackle cross-domain text classification. In this work, extend the idea underlying this approach by making the latent semantic relation ship between the two domains explicit. The Semantic based cross domain classification by providing the algorithm in the extended vector space model of in-domain and out-of-domain documents. Se mantic information was embedded within the document representation, and proved via experimentation that improved classification accuracy can be achieved. The concepts form individual features, with undergoing stemming, or splitting of multi-word expressions.
  • Keywords
    pattern classification; pattern clustering; text analysis; Internet; co-clustering based classification algorithm; cross-domain text classification; document classification system; document representation; document retrieval system; in-domain document; multiword expression; out-of-domain document; semantic based approach; co-clustering; in-domain; out-of-domain; splitting; stemming;
  • fLanguage
    English
  • Publisher
    iet
  • Conference_Titel
    Sustainable Energy and Intelligent Systems (SEISCON 2011), International Conference on
  • Conference_Location
    Chennai
  • Type

    conf

  • DOI
    10.1049/cp.2011.0479
  • Filename
    6143428