DocumentCode :
2734818
Title :
Effect of feature extraction and feature selection on expression data from epithelial ovarian cancer
Author :
Turkeli, Y. ; Ercil, A. ; Sezerman, O.U.
Author_Institution :
Lab. of Comput. Biol., Sabanci Univ., Istanbul, Turkey
Volume :
4
fYear :
2003
fDate :
17-21 Sept. 2003
Firstpage :
3559
Abstract :
Classifying the gene expression levels of normal and cancerous cells and identifying the genes most contributing to this distinction propose an alternative means of diagnosis. We have investigated the effect of feature extraction and feature selection on clustering of the expression data on two different data sets for ovarian cancer. One data set consisted of 2176 transcripts from 30 samples, nine from normal ovarian epithelial cells and 21 from cancerous ones. The other data set had 7129 transcripts coming from 27 tumor and four normal ovarian tissues. Hierarchical clustering algorithms employing complete-link, average-link and Ward´s method were implemented for comparative evaluation. Principal component analysis was applied for feature extraction and resulted in 100% segregation. Feature selection was performed to identify the most distinguishing genes using CART® software. Selected features were able to cluster the data with 100% success. The results suggest that adoption of feature extraction and selection enhances the quality of clustering of gene expression data for ovarian cancer. Identification of distinguishing genes is a more complex problem that requires incorporating pathway knowledge with statistical and machine learning methods.
Keywords :
biological organs; cancer; cellular biophysics; data analysis; feature extraction; genetics; pattern clustering; principal component analysis; tumours; CART® software; Ward method; average-link; cancerous cells; complete-link; data sets; diagnosis; epithelial ovarian cancer; expression data; feature extraction; feature selection; gene expression data clustering enhancement; gene expression levels; genes identification; hierarchical clustering algorithms; machine learning methods; normal cells; ovarian tissues; pathway knowledge; principal component analysis; samples; statistical learning method; Cancer; Cells (biology); Classification tree analysis; Clustering algorithms; Computational biology; Feature extraction; Gene expression; Laboratories; Principal component analysis; Software performance;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Engineering in Medicine and Biology Society, 2003. Proceedings of the 25th Annual International Conference of the IEEE
ISSN :
1094-687X
Print_ISBN :
0-7803-7789-3
Type :
conf
DOI :
10.1109/IEMBS.2003.1280921
Filename :
1280921
Link To Document :
بازگشت