مرکز منطقه ای اطلاع رساني علوم و فناوري - Detection of Outlier Residues for Improving Interface Prediction in Protein Heterocomplexes

DocumentCode :

1497754

Title :

Detection of Outlier Residues for Improving Interface Prediction in Protein Heterocomplexes

Author :

Chen, Peng ; Wong, Limsoon ; Li, Jinyan

Author_Institution :

Inst. of Intell. Machines, Hefei, China

Volume :

Issue :

fYear :

2012

Firstpage :

1155

Lastpage :

1165

Abstract :

Sequence-based understanding and identification of protein binding interfaces is a challenging research topic due to the complexity in protein systems and the imbalanced distribution between interface and noninterface residues. This paper presents an outlier detection idea to address the redundancy problem in protein interaction data. The cleaned training data are then used for improving the prediction performance. We use three novel measures to describe the extent a residue is considered as an outlier in comparison to the other residues: the distance of a residue instance from the center instance of all residue instances of the same class label (Dist), the probability of the class label of the residue instance (PCL), and the importance of within-class and between-class (IWB) residue instances. Outlier scores are computed by integrating the three factors; instances with a sufficiently large score are treated as outliers and removed. The data sets without outliers are taken as input for a support vector machine (SVM) ensemble. The proposed SVM ensemble trained on input data without outliers performs better than that with outliers. Our method is also more accurate than many literature methods on benchmark data sets. From our empirical studies, we found that some outlier interface residues are truly near to noninterface regions, and some outlier noninterface residues are close to interface regions.

Keywords :

benchmark testing; bioinformatics; molecular biophysics; proteins; support vector machines; SVM ensemble; benchmark data sets; interface prediction; noninterface region; outlier interface residues; protein binding interface; protein heterocomplex; protein interaction data; protein systems; redundancy problem; residue distance; residue instance probability; sequence-based understanding; support vector machine ensemble; Bioinformatics; Educational institutions; Proteins; Support vector machines; Training; Training data; Vectors; Outlier detection; SVM ensemble.; protein-protein interaction; Area Under Curve; Computational Biology; Databases, Protein; Protein Binding; Protein Interaction Domains and Motifs; Proteins; Sequence Analysis, Protein; Support Vector Machines;

fLanguage :

English

Journal_Title :

Computational Biology and Bioinformatics, IEEE/ACM Transactions on

Publisher :

ieee

ISSN :

1545-5963

Type :

jour

DOI :

10.1109/TCBB.2012.58

Filename :

6185534

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1497754