Predicting Class-II MHC Binding Peptide Using Global Representation of Peptides

Author

Niu, Yanqing

Author_Institution

Sch. of Math. & Stat., South-Central Univ. for Nat., Wuhan, China

fYear

2011

fDate

14-17 Dec. 2011

Firstpage

308

Lastpage

312

Abstract

Peptide and major histocompatibility complex class II molecule (MHC-II) binding is the key of activating T-cell immune response. The peptides binding with MHC molecules can be well known as T-cell epitopes, and identifying epitopes is the critical for the computer-aided drug design. However, the variable lengths of binding peptides undermine the use of traditional machine learning methods. In this paper, we propose a method that can utilize whole peptides to predict MHC-II binding affinity by using sequence-derived structure and physicochemical properties. First of all, several groups of structural and physicochemical features derived from protein sequences are adopted, which can transform varied-length peptides into fixed-length feature vectors. Thus, sequence-derived features are combined together, and the optimal feature subset was selected by MRMR (minimum Redundancy Maximum Relevance Feature Selection). Subsequently, support vector machines (SVM) are used as the classification engine to construct the prediction models. The performances of our models are evaluated on the benchmark datasets. When compared to the existing popular quantitative methods, our proposed method can give out better or comparable performance, yielding an average AUC of 0.82 on the IEDB datasets, an average AUC of 0.82 on Wang´s dataset. The proposed method yields satisfying performance over existing methods by using full-length representation of the peptides.

Keywords

CAD; benchmark testing; biochemistry; biology computing; cellular biophysics; drugs; learning (artificial intelligence); molecular biophysics; proteins; support vector machines; IEDB datasets; Wang dataset; activating T-cell immune response; as T-cell epitopes; benchmark datasets; class-II MHC binding peptide; computer-aided drug design; fixed-length feature vectors; global representation; histocompatibility complex class II molecule binding; machine learning methods; minimum redundancy maximum relevance feature selection; peptides binding; physicochemical properties; popular quantitative methods; protein; sequence-derived structure; support vector machines; Amino acids; Bioinformatics; Correlation; Encoding; Immune system; Peptides; Proteins; MHC-II quantitative prediction; T-cell immunity; feature selection; sequence-derived structure and physicochemical features;

fLanguage

English

Publisher

ieee

Conference_Titel

Intelligent Computation and Bio-Medical Instrumentation (ICBMI), 2011 International Conference on

Conference_Location

Wuhan, Hubei

Print_ISBN

978-1-4577-1152-7

Type

conf

DOI

10.1109/ICBMI.2011.74

Filename

6131770