DocumentCode :
3402290
Title :
New Feature Selection Algorithm based on Potential Difference
Author :
Liu, Guangyuan ; Liu, Yu ; Dong, Liyan ; Yuan, Senmiao ; Li, Yongli
Author_Institution :
Jilin Univ. Changchun, Jilin
fYear :
2007
fDate :
5-8 Aug. 2007
Firstpage :
566
Lastpage :
570
Abstract :
The new Potential Difference Algorithm for feature selection is a data pre-processing algorithm. Data preprocessing is one of the study topics in data mining. Normally, raw data is just a collection of nonsense numbers. The decision could not make based on the raw data. The algorithms related to data mining and data analysis need some pre-processed data. The quality of pre-processed data will affect how the results really reflect the real world situation. Feature selection is popular in the study of data pre-processing. In this paper, a new algorithm based on potential difference for feature selection has been brought up. Statistic is used as the quantity measurement of correlation. Based on the table, obtain the independent Confidence Level. Two lists are provided for a specific feature subset. One is a descendent list of correlation between class and all features. The other is a descendent list of correlation between reference feature and all features. Based on the different positions in two lists for each feature to accomplish the feature selection. At last paper provides the theoretical analysis and experiment results and analysis based on a sample data coming from a mobile company in China. The algorithm keeps the same accuracy of data analysis with less degree-of-freedom of dimensions of data. By that way, to avoid the time expenses exponentially in data analysis and data mining for high degree-of-freedom of dimension of data and keep the same accuracy for the data analysis. From the experiment results could see two aspects would affect the accuracy of the algorithm. One is the accuracy of discretion. The other is the accuracy of the table. More accuracy of discretion and more accuracy of the table, more accuracy features will be selected.
Keywords :
correlation methods; data analysis; data mining; feature extraction; statistical analysis; correlation descendent list; data analysis; data mining; data preprocessing algorithm; feature selection algorithm; potential difference algorithm; statistical analysis; Automation; Computer science; Data analysis; Data mining; Data preprocessing; Decision trees; Educational institutions; Mechatronics; Statistics; USA Councils; Correlation Probability; Feature Selection; Potential Difference; statistic;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Mechatronics and Automation, 2007. ICMA 2007. International Conference on
Conference_Location :
Harbin
Print_ISBN :
978-1-4244-0828-3
Electronic_ISBN :
978-1-4244-0828-3
Type :
conf
DOI :
10.1109/ICMA.2007.4303605
Filename :
4303605
Link To Document :
بازگشت