Title :
Effective data mining by integrating genetic algorithm into the data preprocessing phase
Author :
Gopalan, Janaki ; Korkmaz, Erkan ; Alhajj, Reda ; Barker, Ken
Author_Institution :
Dept. of Comput. Sci., Calgary Univ., Alta., Canada
Abstract :
Dividing a data set into a training set and a test set is a fundamental component in the preprocessing phase of data mining (DM). Effectively, the choice of the training set is an important factor in deriving good classification rules. Traditional approach for association rules mining divides the dataset into training set and test set based on statistical methods. In this paper, we highlight the weaknesses of the existing approach and hence propose a new methodology that employs genetic algorithm (GA) in the process. In our approach, the original dataset is divided into sample and validation sets. Then, GA is used to find an appropriate split of the sample set into training and test sets. We demonstrate through experiments that using the obtained training set as the input to an association rules mining algorithm generates high accuracy classification rules. The rules are tested on the validation set for accuracy. The results are very satisfactory; they demonstrate the applicability and effectiveness of our approach.
Keywords :
data mining; genetic algorithms; pattern classification; association rules mining algorithm; classification rules; data mining preprocessing; data test set; data training set; genetic algorithm; statistical method; validation set; Association rules; Classification algorithms; Computer science; Data engineering; Data mining; Data preprocessing; Databases; Delta modulation; Genetic algorithms; Testing; association; classification; clustering; data mining; data-splitting.; genetic algorithms; pre-processing;
Conference_Titel :
Machine Learning and Applications, 2005. Proceedings. Fourth International Conference on
Print_ISBN :
0-7695-2495-8
DOI :
10.1109/ICMLA.2005.26