DocumentCode :
8209
Title :
Protein Function Prediction Using Multilabel Ensemble Classification
Author :
Guoxian Yu ; Rangwala, Huzefa ; Domeniconi, Carlotta ; Guoji Zhang ; Zhiwen Yu
Author_Institution :
Coll. of Comput. & Inf. Sci., Southwest Univ., Beibei, China
Volume :
10
Issue :
4
fYear :
2013
fDate :
July-Aug. 2013
Firstpage :
1045
Lastpage :
1057
Abstract :
High-throughput experimental techniques produce several kinds of heterogeneous proteomic and genomic data sets. To computationally annotate proteins, it is necessary and promising to integrate these heterogeneous data sources. Some methods transform these data sources into different kernels or feature representations. Next, these kernels are linearly (or nonlinearly) combined into a composite kernel. The composite kernel is utilized to develop a predictive model to infer the function of proteins. A protein can have multiple roles and functions (or labels). Therefore, multilabel learning methods are also adapted for protein function prediction. We develop a transductive multilabel classifier (TMC) to predict multiple functions of proteins using several unlabeled proteins. We also propose a method called transductive multilabel ensemble classifier (TMEC) for integrating the different data sources using an ensemble approach. The TMEC trains a graph-based multilabel classifier on each single data source, and then combines the predictions of the individual classifiers. We use a directed birelational graph to capture the relationships between pairs of proteins, between pairs of functions, and between proteins and functions. We evaluate the effectiveness of the TMC and TMEC to predict the functions of proteins on three benchmarks. We show that our approaches perform better than recently proposed protein function prediction methods on composite and multiple kernels. The code, data sets used in this paper and supplemental material are available at https://sites.google.com/site/guoxian85/tmec.
Keywords :
benchmark testing; biology computing; genomics; proteomics; TMC method; TMEC method; benchmark; composite kernel; computational annotation; heterogeneous genomic data sets; heterogeneous proteomic data sets; high throughput experimental techniques; multilabel ensemble classification; protein function prediction; transductive multilabel classifier; transductive multilabel ensemble classifier; Bioinformatics; Computational biology; Correlation; IEEE transactions; Kernel; Proteins; Vectors; Multilabel ensemble classifiers; directed birelational graph; protein function prediction;
fLanguage :
English
Journal_Title :
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
Publisher :
ieee
ISSN :
1545-5963
Type :
jour
DOI :
10.1109/TCBB.2013.111
Filename :
6600689
Link To Document :
بازگشت