DocumentCode
1646026
Title
Data mining the PIMA dataset using rough set theory with a special emphasis on rule reduction
Author
Khan, Aurangieb ; Revett, Kenneth
Author_Institution
Dept. of CIS, Luton Univ., UK
fYear
2004
Firstpage
334
Lastpage
339
Abstract
This paper describes how rough set theory can be utilized as a tool for analyzing relatively complex decision tables like the Pima Indian Diabetes Database (PIDD). We utilized Rosetta, a public domain implementation of rough sets on the PIDD in order to determine how we could generate a predictive rule set with the highest accuracy and the fewest number of rules. Having a reduced rule set is advantageous as it provides focus on the salient attributes and makes application in clinical practice more efficient (and likely). In this paper, we report the use of a genetic algorithm based rough set approach to classification of diabetes and achieved a success rate on the test set of 83%. This classification accuracy favors highly compared to other reported results, which ranged from 65% to 75%. In addition, we were able to achieve this accuracy with less than 100 rules. The high accuracy and low rule number provides support to the use of rough sets as a data mining tool in biological databases.
Keywords
biology computing; data mining; database management systems; genetic algorithms; rough set theory; Pima Indian Diabetes Database; biological databases; data mining; genetic algorithm; predictive rule set; rough set theory; rule reduction; Computational Intelligence Society; Data mining; Databases; Diseases; Genetics; Medical diagnostic imaging; Neural networks; Rough sets; Set theory; Testing;
fLanguage
English
Publisher
ieee
Conference_Titel
Multitopic Conference, 2004. Proceedings of INMIC 2004. 8th International
Print_ISBN
0-7803-8680-9
Type
conf
DOI
10.1109/INMIC.2004.1492899
Filename
1492899
Link To Document