Title :
CTFMining: A Method to Predict Candidate Disease Genes Based on the Combined Network Topological Features Mining
Author :
Chen, Lina ; Zhao, Yan ; Zhang, Liangcai ; Shang, Yukui ; Wang, Qian ; Li, Wan ; Wang, Hong ; Li, Xia
Author_Institution :
Coll. of Bioinf. Sci. & Technol., Harbin Med. Univ., Harbin, China
Abstract :
Genes with similar functions might lead to similar phenotypes and tend to locate in a cluster in protein- protein interaction (PPI) network. The responsible genes should be identified for cardiovascular artery disease by their combined network topological features. Here we introduced a method called CTFMining to predict candidate disease genes which based on the combined network topological features mining. Four network topological features were defined to describe the network characters of genes. And then, we used each topological feature and combined topological features to screen the disease genes by training support vector machines (SVMs), respectively. It was found that using combined feature to predicted disease genes would get a better result than using single feature and an optima combined features was found to distinguish disease genes from non-disease genes. According to the optima combined feature, each candidate disease genes were predicted, and finally the intersection of 10,000 predictions was defined to be our final prediction. Finally, 224 candidate disease genes were predicted using SVM. Nearly 86% of candidate disease genes were found to be associated with CAD, which was verified by Priortizer or PandS. Candidate disease genes were likely to share the same functions with known disease genes of CAD. Our optima combined feature could be introduced to distinguish disease genes from non-disease genes well. With the increase of interaction data and further discovery of known disease genes, our method can be applied to predict novel candidate genes better.
Keywords :
bioinformatics; blood vessels; data mining; diseases; feature extraction; genetics; molecular biophysics; proteins; support vector machines; CTFMining; PandS; Priortizer; candidate disease genes prediction; cardiovascular artery disease; network topological features mining; phenotypes; protein-protein interaction network; support vector machines; Arteries; Bioinformatics; Cardiac disease; Cardiology; Cardiovascular diseases; Coronary arteriosclerosis; Educational institutions; Humans; Proteins; Support vector machines;
Conference_Titel :
Bioinformatics and Biomedical Engineering , 2009. ICBBE 2009. 3rd International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4244-2901-1
Electronic_ISBN :
978-1-4244-2902-8
DOI :
10.1109/ICBBE.2009.5162565