Title :
Cross-domain text classification using semantic based approach
Author_Institution :
Rajalakshmi Eng. Coll., Chennai, India
Abstract :
Internet is a huge repository of disparate information growing at an exponential rate. Efficient and effective document retrieval and classification systems are required to turn the massive amount of data into useful information, and eventually into knowledge. A traditional approach to document classification requires labelled data in order to construct reliable and accurate classifiers. A co-clustering based classification algorithm has been previously proposed to tackle cross-domain text classification. In this work, extend the idea underlying this approach by making the latent semantic relation ship between the two domains explicit. The Semantic based cross domain classification by providing the algorithm in the extended vector space model of in-domain and out-of-domain documents. Se mantic information was embedded within the document representation, and proved via experimentation that improved classification accuracy can be achieved. The concepts form individual features, with undergoing stemming, or splitting of multi-word expressions.
Keywords :
pattern classification; pattern clustering; text analysis; Internet; co-clustering based classification algorithm; cross-domain text classification; document classification system; document representation; document retrieval system; in-domain document; multiword expression; out-of-domain document; semantic based approach; co-clustering; in-domain; out-of-domain; splitting; stemming;
Conference_Titel :
Sustainable Energy and Intelligent Systems (SEISCON 2011), International Conference on
Conference_Location :
Chennai
DOI :
10.1049/cp.2011.0479