• DocumentCode
    3730434
  • Title

    Minimum Information Quantity Partition based fast semantic feature subset selection

  • Author

    Donglin Cao; Yanping Lv; Dazhen Lin

  • Author_Institution
    Cognitive Science Department, Xiamen University, China, 361005
  • fYear
    2015
  • Firstpage
    692
  • Lastpage
    699
  • Abstract
    Processing of web text clustering data usually results in more than ten thousand features. The traditional dimensionality reduction methods and optimal feature subset selection methods increase the time complexity. To reduce the time complexity, we propose a Minimum Information Quantity Partition (MIQP) method. First, MIQP selects a useful feature subset by determining the best partition according to the diminishing trend of feature weight curve. Second, to remove the feature independent assumption and compute the semantic relation between selected features, Latent Semantic Indexing (LSI) is used to eliminate noisy data and extend the missed semantic of each sample. This approach reduces the time complexity from O(mn3) to O(mn2). The experimental results show that the performance of MIQP is close to the best clustering results of selecting top k features, and the speed of MIQP is much faster than clustering with all features in our experiment data.
  • Keywords
    "Semantics","Time complexity","Cognitive science","Intelligent systems","Indexing","Noise measurement","Error analysis"
  • Publisher
    ieee
  • Conference_Titel
    Fuzzy Systems and Knowledge Discovery (FSKD), 2015 12th International Conference on
  • Type

    conf

  • DOI
    10.1109/FSKD.2015.7382026
  • Filename
    7382026