DocumentCode :
830075
Title :
Semisupervised learning for molecular profiling
Author :
Furlanello, Cesare ; Serafini, Maria ; Merler, Stefano ; Jurman, Giuseppe
Author_Institution :
ITC-irst, Trento, Italy
Volume :
2
Issue :
2
fYear :
2005
Firstpage :
110
Lastpage :
118
Abstract :
Class prediction and feature selection are two learning tasks that are strictly paired in the search of molecular profiles from microarray data. Researchers have become aware how easy it is to incur a selection bias effect, and complex validation setups are required to avoid overly optimistic estimates of the predictive accuracy of the models and incorrect gene selections. This paper describes a semisupervised pattern discovery approach that uses the by-products of complete validation studies on experimental setups for gene profiling. In particular, we introduce the study of the patterns of single sample responses (sample-tracking profiles) to the gene selection process induced by typical supervised learning tasks in microarray studies. We originate sample-tracking profiles as the aggregated off-training evaluation of SVM models of increasing gene panel sizes. Genes are ranked by E-RFE, an entropy-based variant of the recursive feature elimination for support vector machines (RFE-SVM). A dynamic time warping (DTW) algorithm is then applied to define a metric between sample-tracking profiles. An unsupervised clustering based on the DTW metric allows automating the discovery of outliers and of subtypes of different molecular profiles. Applications are described on synthetic data and in two gene expression studies.
Keywords :
biology computing; entropy; genetics; learning (artificial intelligence); molecular biophysics; support vector machines; class prediction; dynamic time warping; entropy; feature selection; gene expression; gene profiling; gene selections; learning tasks; microarray data; molecular profiling; recursive feature elimination; sample-tracking profiles; semisupervised learning; semisupervised pattern discovery approach; support vector machines; unsupervised clustering; Accuracy; Bioinformatics; Clustering algorithms; Costs; Gene expression; Predictive models; Semisupervised learning; Supervised learning; Support vector machine classification; Support vector machines; Machine learning; bioinformatics databases.; biology and genetics; classifier design and evaluation; clustering; data mining; feature evaluation and selection; pattern analysis; similarity measures; Algorithms; Artificial Intelligence; Cluster Analysis; Gene Expression; Gene Expression Profiling; Oligonucleotide Array Sequence Analysis; Pattern Recognition, Automated;
fLanguage :
English
Journal_Title :
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
Publisher :
ieee
ISSN :
1545-5963
Type :
jour
DOI :
10.1109/TCBB.2005.28
Filename :
1438348
Link To Document :
بازگشت