DocumentCode
3425296
Title
Effective data mining by integrating genetic algorithm into the data preprocessing phase
Author
Gopalan, Janaki ; Korkmaz, Erkan ; Alhajj, Reda ; Barker, Ken
Author_Institution
Dept. of Comput. Sci., Calgary Univ., Alta., Canada
fYear
2005
fDate
15-17 Dec. 2005
Abstract
Dividing a data set into a training set and a test set is a fundamental component in the preprocessing phase of data mining (DM). Effectively, the choice of the training set is an important factor in deriving good classification rules. Traditional approach for association rules mining divides the dataset into training set and test set based on statistical methods. In this paper, we highlight the weaknesses of the existing approach and hence propose a new methodology that employs genetic algorithm (GA) in the process. In our approach, the original dataset is divided into sample and validation sets. Then, GA is used to find an appropriate split of the sample set into training and test sets. We demonstrate through experiments that using the obtained training set as the input to an association rules mining algorithm generates high accuracy classification rules. The rules are tested on the validation set for accuracy. The results are very satisfactory; they demonstrate the applicability and effectiveness of our approach.
Keywords
data mining; genetic algorithms; pattern classification; association rules mining algorithm; classification rules; data mining preprocessing; data test set; data training set; genetic algorithm; statistical method; validation set; Association rules; Classification algorithms; Computer science; Data engineering; Data mining; Data preprocessing; Databases; Delta modulation; Genetic algorithms; Testing; association; classification; clustering; data mining; data-splitting.; genetic algorithms; pre-processing;
fLanguage
English
Publisher
ieee
Conference_Titel
Machine Learning and Applications, 2005. Proceedings. Fourth International Conference on
Print_ISBN
0-7695-2495-8
Type
conf
DOI
10.1109/ICMLA.2005.26
Filename
1607471
Link To Document