مرکز منطقه ای اطلاع رساني علوم و فناوري - New Feature Selection Algorithm based on Potential Difference

DocumentCode :

3402290

Title :

New Feature Selection Algorithm based on Potential Difference

Author :

Liu, Guangyuan ; Liu, Yu ; Dong, Liyan ; Yuan, Senmiao ; Li, Yongli

Author_Institution :

Jilin Univ. Changchun, Jilin

fYear :

2007

fDate :

5-8 Aug. 2007

Firstpage :

566

Lastpage :

570

Abstract :

The new Potential Difference Algorithm for feature selection is a data pre-processing algorithm. Data preprocessing is one of the study topics in data mining. Normally, raw data is just a collection of nonsense numbers. The decision could not make based on the raw data. The algorithms related to data mining and data analysis need some pre-processed data. The quality of pre-processed data will affect how the results really reflect the real world situation. Feature selection is popular in the study of data pre-processing. In this paper, a new algorithm based on potential difference for feature selection has been brought up. Statistic is used as the quantity measurement of correlation. Based on the table, obtain the independent Confidence Level. Two lists are provided for a specific feature subset. One is a descendent list of correlation between class and all features. The other is a descendent list of correlation between reference feature and all features. Based on the different positions in two lists for each feature to accomplish the feature selection. At last paper provides the theoretical analysis and experiment results and analysis based on a sample data coming from a mobile company in China. The algorithm keeps the same accuracy of data analysis with less degree-of-freedom of dimensions of data. By that way, to avoid the time expenses exponentially in data analysis and data mining for high degree-of-freedom of dimension of data and keep the same accuracy for the data analysis. From the experiment results could see two aspects would affect the accuracy of the algorithm. One is the accuracy of discretion. The other is the accuracy of the table. More accuracy of discretion and more accuracy of the table, more accuracy features will be selected.

Keywords :

correlation methods; data analysis; data mining; feature extraction; statistical analysis; correlation descendent list; data analysis; data mining; data preprocessing algorithm; feature selection algorithm; potential difference algorithm; statistical analysis; Automation; Computer science; Data analysis; Data mining; Data preprocessing; Decision trees; Educational institutions; Mechatronics; Statistics; USA Councils; Correlation Probability; Feature Selection; Potential Difference; statistic;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Mechatronics and Automation, 2007. ICMA 2007. International Conference on

Conference_Location :

Harbin

Print_ISBN :

978-1-4244-0828-3

Electronic_ISBN :

978-1-4244-0828-3

Type :

conf

DOI :

10.1109/ICMA.2007.4303605

Filename :

4303605

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3402290