• DocumentCode
    2332817
  • Title

    Using complex linguistic features in context-sensitive text classification techniques

  • Author

    Wong, Alex K S ; Lee, John W T ; Yeung, Daniel S.

  • Author_Institution
    Dept. of Comput., Hong Kong Polytech. Univ., China
  • Volume
    5
  • fYear
    2005
  • fDate
    18-21 Aug. 2005
  • Firstpage
    3183
  • Abstract
    Text classification (TC) is the task to automatically classify documents based on learned document features. Many popular TC models use simple occurrence of words in a document as features. They also commonly assume word occurrences to be statistically independent in their design. Although it is obvious that such assumption does not hold in general, these TC models have been robust and efficient in their task. Some recent studies have shown context-sensitive TC approaches, which take into consideration contexts in the form of word co-occurrences, have been able to perform better in general. On the other hand, there have been many studies in the use of complex linguistic or semantic features instead of simple word occurrences as features for information retrieval and classification tasks. While these complex features may intuitively have more relevance to the tasks concerned, results of these studies on their effectiveness have been mixed and not been conclusive. In this paper we present our investigation on the use of some complex linguistic features with context-sensitive TC method. Our experiment results show some potential advantages of such approach.
  • Keywords
    classification; computational linguistics; context-sensitive languages; text analysis; automatic document classification; complex linguistic feature; context-sensitive text classification; learned document feature; semantic feature; word occurrence; Cybernetics; Electronic mail; Feature extraction; Information retrieval; Machine learning; Machine learning algorithms; Robustness; Text categorization; Text processing; Text classification; complex linguistics feature; context-sensitive; semantics;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning and Cybernetics, 2005. Proceedings of 2005 International Conference on
  • Conference_Location
    Guangzhou, China
  • Print_ISBN
    0-7803-9091-1
  • Type

    conf

  • DOI
    10.1109/ICMLC.2005.1527491
  • Filename
    1527491