DocumentCode :
636243
Title :
Hierarchical clustering combining numerical and biological similarities for gene expression data classification
Author :
Bosio, Mattia ; Salembier, Philippe ; Bellot, Pau ; Oliveras-Verges, Albert
Author_Institution :
Dept. of Signal Theor. & Commun., Tech. Univ. of Catalonia UPC, Barcelona, Spain
fYear :
2013
fDate :
3-7 July 2013
Firstpage :
584
Lastpage :
587
Abstract :
High throughput data analysis is a challenging problem due to the vast amount of available data. A major concern is to develop algorithms that provide accurate numerical predictions and biologically relevant results. A wide variety of tools exist in the literature using biological knowledge to evaluate analysis results. Only recently, some works have included biological knowledge inside the analysis process improving the prediction results. In this work, a knowledge integration scheme is proposed to improve the microarray classification results from [3]. Biological knowledge is used to infer biological similarity which is combined with the classical numerical similarity. The resulting similarity measure is used in a hierarchical clustering process producing new features called metagenes. The goal of the numerical and biological similarities integration is to produce metagenes involving more useful and significant gene signatures. The proposed algorithm has been tested on 7 publicly available datasets. The results have been compared with the state of the art method. The knowledge inclusion has proven beneficial both for the predictive ability, improving the results repeatability, and for the biological relevance after evaluating the produced signatures with two gene list analysis tools.
Keywords :
bioinformatics; data analysis; data integration; genetics; genomics; hierarchical systems; knowledge acquisition; pattern clustering; biological knowledge; biological similarity; classical numerical similarity; gene expression data classification; gene list analysis tool; hierarchical clustering process; high throughput data analysis; knowledge integration scheme; metagenes; microarray classification; prediction results; similarity measure; Algorithm design and analysis; Bioinformatics; Clustering algorithms; Databases; Genomics; Prediction algorithms;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Engineering in Medicine and Biology Society (EMBC), 2013 35th Annual International Conference of the IEEE
Conference_Location :
Osaka
ISSN :
1557-170X
Type :
conf
DOI :
10.1109/EMBC.2013.6609567
Filename :
6609567
Link To Document :
بازگشت