DocumentCode :
3409474
Title :
Protein classification into domains of life using Markov chain models
Author :
Zanoguera, Francisca ; De Francesco, Massimo
Author_Institution :
Serono Pharm. Res. Inst., Switzerland
fYear :
2004
fDate :
16-19 Aug. 2004
Firstpage :
517
Lastpage :
519
Abstract :
It has recently been shown that oligopeptide composition allows clustering proteomes of different organisms into the main domains of life. In this paper, we go a step further by showing that, given a single protein, it is possible to predict whether it has a bacterial or eukaryotic origin with 85% accuracy, and we obtain this result after ensuring that no important homologies exist between the sequences in the test set and the sequences in the training set. To do this, we model the sequence as a Markov chain. A bacterial and an eukaryote model are produced using the training sets. Each input sequence is then classified by calculating the log-odds ratio of the sequence probability for each model. By analyzing the models obtained we extract a set of most discriminant oligopeptides, many of which are part of known functional motifs.
Keywords :
Markov processes; biology computing; microorganisms; molecular biophysics; physiological models; probability; proteins; Markov chain models; bacterial origin; eukaryotic origin; life; log-odds ratio; oligopeptide composition; protein classification; proteome clustering; sequence probability; Amino acids; Archaea; Bioinformatics; Databases; Microorganisms; Peptides; Performance analysis; Proteins; Sequences; Testing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computational Systems Bioinformatics Conference, 2004. CSB 2004. Proceedings. 2004 IEEE
Print_ISBN :
0-7695-2194-0
Type :
conf
DOI :
10.1109/CSB.2004.1332481
Filename :
1332481
Link To Document :
بازگشت