• DocumentCode
    424117
  • Title

    Efficient feature selection for high-dimensional data using two-level filter

  • Author

    Li, Yun ; Wu, Zhong-Fu ; Liu, Jia-Min ; Tang, Yan-Yun

  • Author_Institution
    Dept. of Comput., Chongqing Univ., China
  • Volume
    3
  • fYear
    2004
  • fDate
    26-29 Aug. 2004
  • Firstpage
    1711
  • Abstract
    Feature selection is a key problem to pattern recognition and machine learning, and it is difficult to get the optimal feature subset for its NP-hard. Currently, the dimensionality of feature set or instance set is very high in many applications, such as information retrieval, so the feature selection from high-dimensional data is also an urgent task for researchers. This paper presents a new approach, which is a two-level filter model system integrating the relief and a newly developed algorithm of feature cluster, to reduce the dimensionality of large-scale feature set via the feature correlation (relevance) including the feature-feature correlation and feature-class correlation. Our major contributions are: (1) to present a system to perform feature selection from high-dimensional data; (2) to analyze the change of system architecture according to the time cost of the parts in the system; (3) to summarize and comment on the calculations of feature correlation; (4) to perform experiments to show the effective of the proposed approach, which has shown that the system can efficiently get a better compromise between dimensionality reduction and accuracy rate of classification than just part of the system. In many cases, it can improve the accuracy rate and dimensionality reduction.
  • Keywords
    computational complexity; correlation theory; feature extraction; filtering theory; learning (artificial intelligence); optimisation; pattern classification; pattern clustering; set theory; NP-hard problems; classification accuracy; dimensionality reduction; feature class correlation; feature clustering algorithm; feature selection; feature-feature correlation; high dimensional data; information retrieval; large scale feature set; machine learning; optimal feature subset; pattern recognition; system architecture; two level filter model system; Clustering algorithms; Costs; Educational institutions; Information retrieval; Large scale integration; Machine learning; Optical computing; Optical filters; Optical materials; Pattern recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning and Cybernetics, 2004. Proceedings of 2004 International Conference on
  • Print_ISBN
    0-7803-8403-2
  • Type

    conf

  • DOI
    10.1109/ICMLC.2004.1382051
  • Filename
    1382051