Cross-domain text classification using semantic based approach

Author

Barathi, B.U.A.

Author_Institution

Rajalakshmi Eng. Coll., Chennai, India

fYear

2011

fDate

20-22 July 2011

Firstpage

820

Lastpage

825

Abstract

Internet is a huge repository of disparate information growing at an exponential rate. Efficient and effective document retrieval and classification systems are required to turn the massive amount of data into useful information, and eventually into knowledge. A traditional approach to document classification requires labelled data in order to construct reliable and accurate classifiers. A co-clustering based classification algorithm has been previously proposed to tackle cross-domain text classification. In this work, extend the idea underlying this approach by making the latent semantic relation ship between the two domains explicit. The Semantic based cross domain classification by providing the algorithm in the extended vector space model of in-domain and out-of-domain documents. Se mantic information was embedded within the document representation, and proved via experimentation that improved classification accuracy can be achieved. The concepts form individual features, with undergoing stemming, or splitting of multi-word expressions.

Keywords

pattern classification; pattern clustering; text analysis; Internet; co-clustering based classification algorithm; cross-domain text classification; document classification system; document representation; document retrieval system; in-domain document; multiword expression; out-of-domain document; semantic based approach; co-clustering; in-domain; out-of-domain; splitting; stemming;

fLanguage

English

Publisher

iet

Conference_Titel

Sustainable Energy and Intelligent Systems (SEISCON 2011), International Conference on

Conference_Location

Chennai

Type

conf

DOI

10.1049/cp.2011.0479

Filename

6143428