DocumentCode :
2332817
Title :
Using complex linguistic features in context-sensitive text classification techniques
Author :
Wong, Alex K S ; Lee, John W T ; Yeung, Daniel S.
Author_Institution :
Dept. of Comput., Hong Kong Polytech. Univ., China
Volume :
5
fYear :
2005
fDate :
18-21 Aug. 2005
Firstpage :
3183
Abstract :
Text classification (TC) is the task to automatically classify documents based on learned document features. Many popular TC models use simple occurrence of words in a document as features. They also commonly assume word occurrences to be statistically independent in their design. Although it is obvious that such assumption does not hold in general, these TC models have been robust and efficient in their task. Some recent studies have shown context-sensitive TC approaches, which take into consideration contexts in the form of word co-occurrences, have been able to perform better in general. On the other hand, there have been many studies in the use of complex linguistic or semantic features instead of simple word occurrences as features for information retrieval and classification tasks. While these complex features may intuitively have more relevance to the tasks concerned, results of these studies on their effectiveness have been mixed and not been conclusive. In this paper we present our investigation on the use of some complex linguistic features with context-sensitive TC method. Our experiment results show some potential advantages of such approach.
Keywords :
classification; computational linguistics; context-sensitive languages; text analysis; automatic document classification; complex linguistic feature; context-sensitive text classification; learned document feature; semantic feature; word occurrence; Cybernetics; Electronic mail; Feature extraction; Information retrieval; Machine learning; Machine learning algorithms; Robustness; Text categorization; Text processing; Text classification; complex linguistics feature; context-sensitive; semantics;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Machine Learning and Cybernetics, 2005. Proceedings of 2005 International Conference on
Conference_Location :
Guangzhou, China
Print_ISBN :
0-7803-9091-1
Type :
conf
DOI :
10.1109/ICMLC.2005.1527491
Filename :
1527491
Link To Document :
بازگشت