Title :
Enhancing Protein Domain Detection Using Domain Co-occurrence and Domain Exclusion
Author :
Ghouila, Amel ; Gascuel, Olivier ; Yahia, S.B. ; Bréhélin, Laurent
Author_Institution :
Methods & Algorithms for Bioinf. LIRMM, Univ. Montpellier 2, Montpellier, France
Abstract :
Among the relevant annotations that can be attributed to a protein, domains occupy a key position. Protein domains are sequential and structural motifs that are found independently in different proteins and in different combinations. One of the most widely used domain scheme is the Pfam database which is a collection of protein domain and families. Each family in Pfam is represented by a multiple sequence alignment and a Hidden Markov Model (HMM).When analyzing a new protein sequence, each Pfam HMM is used to compute a score measuring the similarity between the sequence and the domain. If the score is above a given threshold provided by Pfam, the presence of the domain can be asserted in the protein. However, when applied to proteins of organisms with high evolutionary distance from classical model organisms, this strategy may miss several domains. We recently proposed a method, the Co-Occurrence Domain Detection approach (CODD), that improves the sensitivity of Pfam domain detection by exploiting the tendency of domains to appear preferentially with a few other favorite domains in a protein. Here, we propose to integrate domain exclusion information to prune false positive domains that are in conflict with other domains of the protein. Applied to P. falciparum and L. major proteins, we show that this strategy allows to substantially reduce the proportion of false positives among the new domains predicted by CODD, while preserving as much as possible the sensitivity of the approach.
Keywords :
bioinformatics; hidden Markov models; proteins; CODD algorithm; L. major proteins; P. falciparum proteins; Pfam HMM database; Pfam domain detection sensitivity improvement; cooccurrence domain detection approach; domain exclusion information integration; evolutionary distance; false positive domains; hidden Markov model; information annotations; organism proteins; protein families; protein sequence alignment; score computation; sequential motifs; structural motifs; Bioinformatics; Databases; Hidden Markov models; Organisms; Proteins; Sensitivity; Co-occurrence; Domain prediction; HMM; domain exclusion;
Conference_Titel :
Database and Expert Systems Applications (DEXA), 2012 23rd International Workshop on
Conference_Location :
Vienna
Print_ISBN :
978-1-4673-2621-6
DOI :
10.1109/DEXA.2012.45