DocumentCode
3242579
Title
Undiagnosed samples aided rough set feature selection for medical data
Author
Donghai Guan ; Weiwei Yuan ; Zilong Jin ; Sungyoung Lee
Author_Institution
Coll. of Autom., Harbin Eng. Univ., Harbin, China
fYear
2012
fDate
6-8 Dec. 2012
Firstpage
639
Lastpage
644
Abstract
Medical data often consists of a large number of disease markers. For medical data analysis, some disease markers are not helpful and sometimes even have negative effects. Therefore, applying feature selection is necessary as it can remove those unimportant disease markers. Among many feature selection methods, rough set based feature selection (RSFS) has been widely used. Unlike other methods, RSFS is completely data-driven. It does not require any other information like probability distributions. Traditional RSFS methods extract the information only from the diagnosed samples. Therefore, they usually require a large number of diagnosed samples to achieve the good feature selection performance. However, in many real medical applications, diagnosed samples are limited, yet the number of undiagnosed samples is large. Motivated by semi-supervised learning methodology, in this paper, we propose a novel RSFS method which can learn from both diagnosed and undiagnosed samples. This method is called undiagnosed samples aided rough set feature selection (USA-RSFS). Its main benefit is to reduce the requirement on diagnosed samples by the help of undiagnosed ones. Finally, the promising performance of USA-RSFS is validated through a set of experiments on medical datasets.
Keywords
data analysis; learning (artificial intelligence); medical administrative data processing; patient diagnosis; rough set theory; statistical distributions; USA-RSFS; disease markers; medical data analysis; medical datasets; probability distributions; rough set based feature selection; semi-supervised learning methodology; undiagnosed samples aided rough set feature selection; Medical diagnostic imaging; feature selection; rough set; semi-supervised learning; undiagnosed samples;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel Distributed and Grid Computing (PDGC), 2012 2nd IEEE International Conference on
Conference_Location
Solan
Print_ISBN
978-1-4673-2922-4
Type
conf
DOI
10.1109/PDGC.2012.6449895
Filename
6449895
Link To Document