Title :
Improving semi-supervised fuzzy c-means classification of Breast Cancer data using feature selection
Author :
Lai, Daphne Teck Ching ; Garibaldi, Jonathan M.
Author_Institution :
Sch. of Comput. Sci., Univ. of Nottingham, Nottingham, UK
Abstract :
In previous work, six clinically novel and useful subgroups of breast cancer were identified using rules and clinicians´ expertise to combine solutions from three different clustering algorithms on a database of biomarkers. The motivation for the present work is to reproduce this classification using a single clustering method. In the long term, we hope to produce a clinically useful classification using fewer features (biomarkers), reducing the time and cost of running complex and expensive clinical tests. Hence, the aim of this paper is to investigate the use of feature selection in combination with ssFCM to reduce the number of features while maintaining accuracy (defined as agreement with the previous classification), both on our breast cancer biomarker data and on other benchmark datasets. We show experimental results using four feature selection techniques, exploring with 10, 15 and 17 selected features out of the original 25 biomarkers for breast cancer. We experimented with varying amounts of labelled data (10% - 60% of the training data) and we evaluate classification accuracy using cross-validation. It was found that classification accuracy increased using 15 or 17 breast cancer biomarkers. Using SVM-RFE and CFS, improved classification accuracy was found on three UCI datasets, Arrhythmia, Cardiotocography and Yeast.
Keywords :
biological tissues; cancer; feature extraction; fuzzy set theory; learning (artificial intelligence); medical computing; pattern classification; pattern clustering; support vector machines; CFS; SVM-RFE; UCI dataset; arrhythmia; biomarker database; breast cancer biomarker data; breast cancer data; cardiotocography; classification accuracy; clinician expertise; clustering algorithm; cross-validation; feature selection technique; semisupervised fuzzy c-means classification; ssFCM; yeast; Accuracy; Biomarkers; Breast cancer; Partitioning algorithms; Prediction algorithms; Training; Training data; FCM; breast cancer classification; feature selection; semi-supervised;
Conference_Titel :
Fuzzy Systems (FUZZ), 2013 IEEE International Conference on
Conference_Location :
Hyderabad
Print_ISBN :
978-1-4799-0020-6
DOI :
10.1109/FUZZ-IEEE.2013.6622544