• DocumentCode
    124171
  • Title

    Feature Selection and Term Weighting

  • Author

    Algarni, Abdulmohsen ; Tairan, Nasser

  • Author_Institution
    Coll. of Comput. Sci., King Khalid Univ., Abha, Saudi Arabia
  • Volume
    1
  • fYear
    2014
  • fDate
    11-14 Aug. 2014
  • Firstpage
    336
  • Lastpage
    339
  • Abstract
    Term-based approaches can extract many features in text documents, but most include noise. Many popular text-mining techniques have been adapted to reduce noisy information from extracted features but still contains some noises features. However, the noise features are extracted from the same training documents that good features extracted from. Therefore, the main problem is that some training documents contain large a mount of noises data. If we can reduce the noises data in the training documents that would help to reduce noises in extracted features. Moreover, we believe that remove some of training documents (documents that contains noises data more than useful data) can help to improve the effectiveness of the classifier. Using the advantages of clustering method can help to reduce the affect of noises data. The main problem of clustering is defined to be that of finding groups of similar projects in the data. In this paper we introduce the methodology that using clustering algorithm to group training data before use it. Also we tested our theory that not all training documents are useful to train the classifier.
  • Keywords
    data mining; data reduction; feature extraction; feature selection; pattern classification; pattern clustering; text analysis; classifier; clustering algorithm; feature extraction; feature selection; group training data; noise data reduction; noisy information reduction; term weighting approach; text documents; text-mining techniques; training documents; Feature extraction; Frequency measurement; Information retrieval; Noise; Text categorization; Training; Data mining; Information retrieval; Text mining;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Web Intelligence (WI) and Intelligent Agent Technologies (IAT), 2014 IEEE/WIC/ACM International Joint Conferences on
  • Conference_Location
    Warsaw
  • Type

    conf

  • DOI
    10.1109/WI-IAT.2014.53
  • Filename
    6927562