DocumentCode
3730434
Title
Minimum Information Quantity Partition based fast semantic feature subset selection
Author
Donglin Cao; Yanping Lv; Dazhen Lin
Author_Institution
Cognitive Science Department, Xiamen University, China, 361005
fYear
2015
Firstpage
692
Lastpage
699
Abstract
Processing of web text clustering data usually results in more than ten thousand features. The traditional dimensionality reduction methods and optimal feature subset selection methods increase the time complexity. To reduce the time complexity, we propose a Minimum Information Quantity Partition (MIQP) method. First, MIQP selects a useful feature subset by determining the best partition according to the diminishing trend of feature weight curve. Second, to remove the feature independent assumption and compute the semantic relation between selected features, Latent Semantic Indexing (LSI) is used to eliminate noisy data and extend the missed semantic of each sample. This approach reduces the time complexity from O(mn3) to O(mn2). The experimental results show that the performance of MIQP is close to the best clustering results of selecting top k features, and the speed of MIQP is much faster than clustering with all features in our experiment data.
Keywords
"Semantics","Time complexity","Cognitive science","Intelligent systems","Indexing","Noise measurement","Error analysis"
Publisher
ieee
Conference_Titel
Fuzzy Systems and Knowledge Discovery (FSKD), 2015 12th International Conference on
Type
conf
DOI
10.1109/FSKD.2015.7382026
Filename
7382026
Link To Document