DocumentCode :
1736644
Title :
A comparative study of various feature selection techniques in high-dimensional data set to improve classification accuracy
Author :
Shroff, Kandarp P. ; Maheta, Hardik H.
Author_Institution :
Dept. of Inf. Technol., Dharmsinh Desai Univ., Nadiad, India
fYear :
2015
Firstpage :
1
Lastpage :
6
Abstract :
The performance of machine learning algorithm depends on features considered from the dataset. High dimensional dataset degrades the performance of learning algorithm as learning algorithm try to analyze and accommodate all the features. Feature selection technique is used as a pre-processing step to analyze and compress large data set. The main objective of feature selection technique is to identify relevant features and removes redundant features from high dimensional dataset. The main goal of feature selection technique is to reduce dimensions of dataset, improve classification accuracy, reduce computational cost, and better visualization of data. By considering only useful features, the performance of classification algorithm can be improved. To select reduced set of relevant features from set of all features, various search techniques such as complete search, random search and sequential search etc. are used. Each generated subset of features is validated using various evaluation techniques such as filter, wrapper, and hybrid approach. The main aim of any search technique is to generate optimal subset of features. A general methodology of feature selection process is summarized in this paper on the basis of search and evaluation techniques. In this paper, we provide a comprehensive review of the recent developments in feature selection techniques. Classification with appropriate feature selection technique has shown better performance in the field of machine learning.
Keywords :
data visualisation; feature selection; learning (artificial intelligence); pattern classification; classification accuracy; data visualization; feature selection techniques; high dimensional dataset; high-dimensional data set; machine learning algorithm; Accuracy; Classification algorithms; Filtering algorithms; Genetic algorithms; Machine learning algorithms; Search problems; Time complexity; Classification; Evaluation Techniques; Feature Selection; High dimensional datasets; Search Techniques;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Communication and Informatics (ICCCI), 2015 International Conference on
Conference_Location :
Coimbatore
Print_ISBN :
978-1-4799-6804-6
Type :
conf
DOI :
10.1109/ICCCI.2015.7218098
Filename :
7218098
Link To Document :
بازگشت