Title :
Predicting Class-II MHC Binding Peptide Using Global Representation of Peptides
Author_Institution :
Sch. of Math. & Stat., South-Central Univ. for Nat., Wuhan, China
Abstract :
Peptide and major histocompatibility complex class II molecule (MHC-II) binding is the key of activating T-cell immune response. The peptides binding with MHC molecules can be well known as T-cell epitopes, and identifying epitopes is the critical for the computer-aided drug design. However, the variable lengths of binding peptides undermine the use of traditional machine learning methods. In this paper, we propose a method that can utilize whole peptides to predict MHC-II binding affinity by using sequence-derived structure and physicochemical properties. First of all, several groups of structural and physicochemical features derived from protein sequences are adopted, which can transform varied-length peptides into fixed-length feature vectors. Thus, sequence-derived features are combined together, and the optimal feature subset was selected by MRMR (minimum Redundancy Maximum Relevance Feature Selection). Subsequently, support vector machines (SVM) are used as the classification engine to construct the prediction models. The performances of our models are evaluated on the benchmark datasets. When compared to the existing popular quantitative methods, our proposed method can give out better or comparable performance, yielding an average AUC of 0.82 on the IEDB datasets, an average AUC of 0.82 on Wang´s dataset. The proposed method yields satisfying performance over existing methods by using full-length representation of the peptides.
Keywords :
CAD; benchmark testing; biochemistry; biology computing; cellular biophysics; drugs; learning (artificial intelligence); molecular biophysics; proteins; support vector machines; IEDB datasets; Wang dataset; activating T-cell immune response; as T-cell epitopes; benchmark datasets; class-II MHC binding peptide; computer-aided drug design; fixed-length feature vectors; global representation; histocompatibility complex class II molecule binding; machine learning methods; minimum redundancy maximum relevance feature selection; peptides binding; physicochemical properties; popular quantitative methods; protein; sequence-derived structure; support vector machines; Amino acids; Bioinformatics; Correlation; Encoding; Immune system; Peptides; Proteins; MHC-II quantitative prediction; T-cell immunity; feature selection; sequence-derived structure and physicochemical features;
Conference_Titel :
Intelligent Computation and Bio-Medical Instrumentation (ICBMI), 2011 International Conference on
Conference_Location :
Wuhan, Hubei
Print_ISBN :
978-1-4577-1152-7
DOI :
10.1109/ICBMI.2011.74