DocumentCode :
3730434
Title :
Minimum Information Quantity Partition based fast semantic feature subset selection
Author :
Donglin Cao; Yanping Lv; Dazhen Lin
Author_Institution :
Cognitive Science Department, Xiamen University, China, 361005
fYear :
2015
Firstpage :
692
Lastpage :
699
Abstract :
Processing of web text clustering data usually results in more than ten thousand features. The traditional dimensionality reduction methods and optimal feature subset selection methods increase the time complexity. To reduce the time complexity, we propose a Minimum Information Quantity Partition (MIQP) method. First, MIQP selects a useful feature subset by determining the best partition according to the diminishing trend of feature weight curve. Second, to remove the feature independent assumption and compute the semantic relation between selected features, Latent Semantic Indexing (LSI) is used to eliminate noisy data and extend the missed semantic of each sample. This approach reduces the time complexity from O(mn3) to O(mn2). The experimental results show that the performance of MIQP is close to the best clustering results of selecting top k features, and the speed of MIQP is much faster than clustering with all features in our experiment data.
Keywords :
"Semantics","Time complexity","Cognitive science","Intelligent systems","Indexing","Noise measurement","Error analysis"
Publisher :
ieee
Conference_Titel :
Fuzzy Systems and Knowledge Discovery (FSKD), 2015 12th International Conference on
Type :
conf
DOI :
10.1109/FSKD.2015.7382026
Filename :
7382026
Link To Document :
بازگشت