A sequential cosine similarity based feature selection technique for high dimensional datasets

Author

Vimal Kumar Dubey;Amit Kumar Saxena

Author_Institution

Department of Computer Science and Information Technology, Guru Ghasidas Vishwavidyalaya, Bilaspur, Chattisgarh, India, 495009

fYear

2015

Firstpage

1

Lastpage

5

Abstract

Due to day to day use of information processing in society, the size of the databases has become tremendously high. It has been realized that most of the times, all parameters (called features precisely here) are not required to decide the outcome (or decision) of an instance. Therefore feature selection is an important step in data processing. In this paper, a novel method is presented to select features. In the method, cosine similarity of individual feature of the database with the respective class is computed and kept in an array in descending order. The first feature of this array is combined with rest of the features sequentially one by one. If the classification accuracy of the combination of features increases then the combination is accepted otherwise the responsible features are eliminated from the combination. In this manner all features are tested and a final subset of features is obtained. The results obtained after rigorous experiments on the proposed method on high dimensional databases and comparing with other methods reported so far are encouraging. It is therefore recommended that the proposed method can be applied for high dimensional data processing.

Keywords

"Databases","Feature extraction","Classification algorithms","Testing","Robustness","Brain models"

Publisher

ieee

Conference_Titel

Systems Conference (NSC), 2015 39th National

Type

conf

DOI

10.1109/NATSYS.2015.7489113

Filename

7489113