DocumentCode
3213546
Title
Cross-domain text classification using semantic based approach
Author
Barathi, B.U.A.
Author_Institution
Rajalakshmi Eng. Coll., Chennai, India
fYear
2011
fDate
20-22 July 2011
Firstpage
820
Lastpage
825
Abstract
Internet is a huge repository of disparate information growing at an exponential rate. Efficient and effective document retrieval and classification systems are required to turn the massive amount of data into useful information, and eventually into knowledge. A traditional approach to document classification requires labelled data in order to construct reliable and accurate classifiers. A co-clustering based classification algorithm has been previously proposed to tackle cross-domain text classification. In this work, extend the idea underlying this approach by making the latent semantic relation ship between the two domains explicit. The Semantic based cross domain classification by providing the algorithm in the extended vector space model of in-domain and out-of-domain documents. Se mantic information was embedded within the document representation, and proved via experimentation that improved classification accuracy can be achieved. The concepts form individual features, with undergoing stemming, or splitting of multi-word expressions.
Keywords
pattern classification; pattern clustering; text analysis; Internet; co-clustering based classification algorithm; cross-domain text classification; document classification system; document representation; document retrieval system; in-domain document; multiword expression; out-of-domain document; semantic based approach; co-clustering; in-domain; out-of-domain; splitting; stemming;
fLanguage
English
Publisher
iet
Conference_Titel
Sustainable Energy and Intelligent Systems (SEISCON 2011), International Conference on
Conference_Location
Chennai
Type
conf
DOI
10.1049/cp.2011.0479
Filename
6143428
Link To Document