Title :
G Protein-Coupled Receptor Classification at the Subfamily Level with Probabilistic Suffix Tree
Author :
Yang, Jingyi ; Deogun, Jitender
Author_Institution :
Dept. of Comput. Sci. & Eng., Nebraska-Lincoln Univ., Lincoln, NE
Abstract :
Classifying G protein-coupled receptors (GPCRs) is an interesting topic because of the important role of GPCRs in pharmaceutical research. GPCRs have diverse functions and are involved in many biological processes, which makes them an ideal target of novel medicine. The diverse nature of GPCRs results in the lack of overall sequence homolog among the members, making the classification of GPCRs a very challenging task. Various approaches and methods have been applied to this task, such as HMM, decision tree, and SVM. However, their performances are not completely satisfactory. In this paper, we propose a new method to classify GPCRs into different subfamilies. In the proposed method, the probabilistic suffix tree (PST) is used to construct a prediction model for each of the subfamilies. To classify a GPCR protein, we calculate its similarity score against the PST prediction model of each subfamily using the multi-domain local prediction algorithm. The protein is then classified into the subfamily which gives it the highest score. Our method only uses the primary sequence information and is also very efficient. The model construction and prediction process takes very short time. However, it reports the 98.07% and 97.35% overall accuracy on the level I and II subfamily classification in a 2-fold cross validation test respectively. Given the high accuracy and efficiency, our method is a significant improvement on previously reported ones
Keywords :
pattern classification; pharmaceutical industry; prediction theory; probability; proteins; trees (mathematics); G protein-coupled receptor classification; biological process; multidomain local prediction algorithm; pharmaceutical research; prediction model; probabilistic suffix tree; Biological processes; Classification tree analysis; Decision trees; Hidden Markov models; Pharmaceuticals; Prediction algorithms; Predictive models; Proteins; Support vector machine classification; Support vector machines; GPCR protein classification; multi-domain local prediction; probabilistic suffix tree;
Conference_Titel :
Computational Intelligence and Bioinformatics and Computational Biology, 2006. CIBCB '06. 2006 IEEE Symposium on
Conference_Location :
Toronto, Ont.
Print_ISBN :
1-4244-0623-4
Electronic_ISBN :
1-4244-0624-2
DOI :
10.1109/CIBCB.2006.330976