• DocumentCode
    3242579
  • Title

    Undiagnosed samples aided rough set feature selection for medical data

  • Author

    Donghai Guan ; Weiwei Yuan ; Zilong Jin ; Sungyoung Lee

  • Author_Institution
    Coll. of Autom., Harbin Eng. Univ., Harbin, China
  • fYear
    2012
  • fDate
    6-8 Dec. 2012
  • Firstpage
    639
  • Lastpage
    644
  • Abstract
    Medical data often consists of a large number of disease markers. For medical data analysis, some disease markers are not helpful and sometimes even have negative effects. Therefore, applying feature selection is necessary as it can remove those unimportant disease markers. Among many feature selection methods, rough set based feature selection (RSFS) has been widely used. Unlike other methods, RSFS is completely data-driven. It does not require any other information like probability distributions. Traditional RSFS methods extract the information only from the diagnosed samples. Therefore, they usually require a large number of diagnosed samples to achieve the good feature selection performance. However, in many real medical applications, diagnosed samples are limited, yet the number of undiagnosed samples is large. Motivated by semi-supervised learning methodology, in this paper, we propose a novel RSFS method which can learn from both diagnosed and undiagnosed samples. This method is called undiagnosed samples aided rough set feature selection (USA-RSFS). Its main benefit is to reduce the requirement on diagnosed samples by the help of undiagnosed ones. Finally, the promising performance of USA-RSFS is validated through a set of experiments on medical datasets.
  • Keywords
    data analysis; learning (artificial intelligence); medical administrative data processing; patient diagnosis; rough set theory; statistical distributions; USA-RSFS; disease markers; medical data analysis; medical datasets; probability distributions; rough set based feature selection; semi-supervised learning methodology; undiagnosed samples aided rough set feature selection; Medical diagnostic imaging; feature selection; rough set; semi-supervised learning; undiagnosed samples;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel Distributed and Grid Computing (PDGC), 2012 2nd IEEE International Conference on
  • Conference_Location
    Solan
  • Print_ISBN
    978-1-4673-2922-4
  • Type

    conf

  • DOI
    10.1109/PDGC.2012.6449895
  • Filename
    6449895