DocumentCode
2914070
Title
Determine the Critical dimension in data mining (experiments with bioinformatics datasets)
Author
Suryakumar, Divya ; Sung, Andrew H. ; Liu, Qingzhong
Author_Institution
Dept. of Comput. Sci. & Eng., New Mexico Tech, Socorro, TX, USA
fYear
2011
fDate
22-24 Nov. 2011
Firstpage
481
Lastpage
486
Abstract
The "curse of dimensionality" problem, which occurs in many applications involving data mining such as biomedical informatics, digital forensics, risk management, etc., makes it difficult to develop accurate learning machine classifiers when the dataset includes too many irrelevant or insignificant features. Therefore, finding the smallest set of features necessary to obtain the most accurate classifier is an issue of great theoretical and practical interest. In efforts toward developing formal methods for finding the "critical dimension", this paper presents an empirical study of the minimum number of features that are required for a learning machine to perform accurately. The dataset is first featured ranked; then, iteratively, the least important feature is removed and the performance is plotted against the number of features; the point at which the performance curve drops significantly and does not rise again gives the critical dimension, which is a unique number for each specific combination of learning machine and feature ranking method. It is shown in this paper that the critical dimension phenomenon indeed exists for several of the bioinformatics datasets studied.
Keywords
bioinformatics; data mining; formal verification; learning (artificial intelligence); pattern classification; bioinformatics datasets; biomedical informatics; critical dimension; curse of dimensionality problem; data mining; digital forensics; feature ranking method; formal methods; learning machine classifiers; risk management; Decision support systems; Intelligent systems; Critical dimension; data mining; dimensionality reduction; feature or attribute reduction;
fLanguage
English
Publisher
ieee
Conference_Titel
Intelligent Systems Design and Applications (ISDA), 2011 11th International Conference on
Conference_Location
Cordoba
ISSN
2164-7143
Print_ISBN
978-1-4577-1676-8
Type
conf
DOI
10.1109/ISDA.2011.6121702
Filename
6121702
Link To Document