Title :
A new sampling technique and SVM classification for feature selection in high-dimensional Imbalanced dataset
Author :
Deepa, T. ; Punithavalli, M.
Author_Institution :
Comput. Sci. Dept., Karpagam Univ., Coimbatore, India
Abstract :
Feature selection in high-dimensional Imbalanced dataset (where one class highly outnumbers the other class) is an exigent task in data mining. Feature selection refers to selecting a subset of features from the original dataset. This paper focus on two problems i) Balancing the dataset ii) extracting the features. A new technique called Evolutionary sampling technique [EST] is developed to balance the dataset and Support Vector Machine [SVM] classification is used to calculate the accuracy and also to overcome the over fitting problem while sampling the dataset. The techniques are evaluated on a micro array dataset.
Keywords :
data mining; genetic algorithms; pattern classification; sampling methods; support vector machines; SVM classification; data mining; evolutionary sampling technique; feature selection; imbalanced dataset; microarray dataset; support vector machines; Cancer; Data mining; Earth Observing System; Feature extraction; Genetic algorithms; Machine learning; Support vector machines; Evolutionary oversampling and Undersampling; Feature selection; Genetic algorithm; Imbalanced dataset; Support vector machine (SVM);
Conference_Titel :
Electronics Computer Technology (ICECT), 2011 3rd International Conference on
Conference_Location :
Kanyakumari
Print_ISBN :
978-1-4244-8678-6
Electronic_ISBN :
978-1-4244-8679-3
DOI :
10.1109/ICECTECH.2011.5942028