Author_Institution :
Dept. of Biol. Sci., Univ. of Maryland, Baltimore, MD, USA
Abstract :
Identifying the functional context for key molecular disruptions in complex diseases is a major goal of modern medicine that will lead to improved preventive clinical approaches, earlier diagnosis and more effective personalized disease therapies. Most available resources for visualization and analysis of disease mutations centered on gene analysis and do not leverage information about similarity on the functional context of the mutation. In addition, gene-centric approaches are confounded because genes may share some functional sub-units, or protein domains, but not others. We have built a resource for domain mapping of disease mutations, DMDM, a protein domain database in which each disease mutation can be displayed by its protein domain location. DMDM provides a unique domain-level view where human coding mutations are mapped to protein domains by highlighting molecular relationships among mutations from different diseases that might not have been discovered with traditional gene-centric visualization tools. We have also developed a statistical method, the domain significance scores (DSScores), to assess the significance of disease mutation clusters on protein domains. When we applied the DS-Scores to human data and identified domain hotspots in oncogenes, tumor suppressors, and genes associated with Mendelian diseases. Since most proteins need to interact to perform their function, the identification domains of clinical relevance needs to be complemented by the identification of domain-domain interactions to provide a better understanding about the molecular underpinning of disease. Computational tools to predict domain-domain interactions provide a detailed molecular view of the protein interactions and complements expensive and laborious experimental techniques to identify such interactions. The evolutionary distances of interacting proteins often display a higher level of similarity than those of non-interacting proteins. This finding indicates that interact- ng proteins are subject to common evolutionary constraints and constitute the basis of a method to predict protein interactions known as mirrortree. It has been difficult, however, to identify the direct cause of the observed similarities between evolutionary trees. One possible explanation is the existence of compensatory mutations between partners binding sites to maintain proper binding. This explanation, however, has been recently challenged. It has been suggested that the signal of correlated evolution uncovered by the mirrortree method is unrelated to any correlated evolution between binding sites. We have addressed this controversial debate in the field by studying the contribution of binding sites to the correlation between evolutionary trees of interacting domains. We showed that binding neighborhoods of interacting proteins have, on average, higher co-evolutionary signal compared to the regions outside binding sites; although when the binding neighborhood was removed, the remaining domain sequence still contained some co-evolutionary signal. These results provide evidence of the role of compensatory mutations in protein co-evolution and contribute to our understanding of co-evolution of interacting proteins. Our domain-centric methods have the potential to be incorporated into translational bioinformatics tools for functional characterization of rare and common human variants from large-scale sequencing studies.
Keywords :
bioinformatics; diseases; evolution (biological); genetics; molecular biophysics; patient diagnosis; patient treatment; proteins; statistical analysis; tumours; DMDM; DS-Scores; Mendelian diseases; binding sites; coevolutionary signal; compensatory mutations; complex diseases; computational tools; disease diagnosis; disease mutation clusters; domain mapping; domain sequence; domain significance scores; domain-domain interactions; evolutionary constraints; evolutionary distances; evolutionary trees; gene analysis; gene-centric approach; gene-centric visualization tools; human coding mutations; human data; large-scale sequencing; mirrortree method; molecular disruption; molecular underpinning; oncogenes; personalized disease therapy; protein coevolution; protein domain database; protein domain-centric approach; protein interactions; statistical method; translational bioinformatics; tumor suppressors; Bioinformatics; Context; Data visualization; Diseases; Humans; Marine vehicles; Proteins; coevolution; prediction of domain-domain interactions; protein domain; translational bioinformatics;