• DocumentCode
    3402290
  • Title

    New Feature Selection Algorithm based on Potential Difference

  • Author

    Liu, Guangyuan ; Liu, Yu ; Dong, Liyan ; Yuan, Senmiao ; Li, Yongli

  • Author_Institution
    Jilin Univ. Changchun, Jilin
  • fYear
    2007
  • fDate
    5-8 Aug. 2007
  • Firstpage
    566
  • Lastpage
    570
  • Abstract
    The new Potential Difference Algorithm for feature selection is a data pre-processing algorithm. Data preprocessing is one of the study topics in data mining. Normally, raw data is just a collection of nonsense numbers. The decision could not make based on the raw data. The algorithms related to data mining and data analysis need some pre-processed data. The quality of pre-processed data will affect how the results really reflect the real world situation. Feature selection is popular in the study of data pre-processing. In this paper, a new algorithm based on potential difference for feature selection has been brought up. Statistic is used as the quantity measurement of correlation. Based on the table, obtain the independent Confidence Level. Two lists are provided for a specific feature subset. One is a descendent list of correlation between class and all features. The other is a descendent list of correlation between reference feature and all features. Based on the different positions in two lists for each feature to accomplish the feature selection. At last paper provides the theoretical analysis and experiment results and analysis based on a sample data coming from a mobile company in China. The algorithm keeps the same accuracy of data analysis with less degree-of-freedom of dimensions of data. By that way, to avoid the time expenses exponentially in data analysis and data mining for high degree-of-freedom of dimension of data and keep the same accuracy for the data analysis. From the experiment results could see two aspects would affect the accuracy of the algorithm. One is the accuracy of discretion. The other is the accuracy of the table. More accuracy of discretion and more accuracy of the table, more accuracy features will be selected.
  • Keywords
    correlation methods; data analysis; data mining; feature extraction; statistical analysis; correlation descendent list; data analysis; data mining; data preprocessing algorithm; feature selection algorithm; potential difference algorithm; statistical analysis; Automation; Computer science; Data analysis; Data mining; Data preprocessing; Decision trees; Educational institutions; Mechatronics; Statistics; USA Councils; Correlation Probability; Feature Selection; Potential Difference; statistic;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Mechatronics and Automation, 2007. ICMA 2007. International Conference on
  • Conference_Location
    Harbin
  • Print_ISBN
    978-1-4244-0828-3
  • Electronic_ISBN
    978-1-4244-0828-3
  • Type

    conf

  • DOI
    10.1109/ICMA.2007.4303605
  • Filename
    4303605