Title :
Prediction of Specific Protein-DNA Recognition by Knowledge-based Two-body and Three-body Interaction Potentials
Author :
Guijun Zhao ; Carson, M.B. ; Hui Lu
Author_Institution :
Univ. of Ilinois at Chicago, Chicago
Abstract :
Gene regulation requires specific protein-DNA interactions. Detecting the short and variable DNA sequences in gene promoter regions to which transcription factors (TF) bind is a difficult challenge in bioinformatics. Here we have developed two-body and three-body interaction potentials that are able to assess protein-DNA interaction and achieve a higher level of specificity in the recognition of TF-binding sites. The potentials were calculated using experimentally characterized 3-D structures of protein-DNA complexes. We implemented two approaches in order to evaluate the potentials. Using the first method, we calculated the Z-score of the potential energy of a true TF-binding sequence when compared to 50,000 randomly generated DNA sequences. The second method allowed us to take advantage of the ability of statistical potentials to recognize novel TF-binding sites within the promoter region of genes. We found that the three-body potential, which takes into account the interaction between a DNA base and a protein residue with regard to the effect of a neighboring DNA base, had a better average Z-score than that of the two-body potential. This neighbor effect suggests that the local conformation of DNA does play a critical role in specific residue-base recognition. In all cases, the potentials developed here outperformed published results. The two sets of potentials were tested further by applying them in genome-scale TF-binding site prediction for the CRP protein in E. coli. Out of the 142 cases, 28% of the true binding sites ranked first (i.e. had the lowest Z-score), while in 59 % of cases the true binding site ranked in the top 5. We show with these results that statistical potentials can be used in genome-scale TF-binding site prediction.
Keywords :
DNA; biochemistry; genetics; molecular biophysics; molecular configurations; proteins; statistical analysis; 3-D structures; DNA conformation; DNA sequences; E. coli; Z-score; bioinformatics; gene promoter; gene regulation; knowledge-based interaction potentials; protein-DNA recognition; residue-base recognition; three-body interaction potentials; transcription factors; two-body interaction potentials; Amino acids; Bioinformatics; Crystallography; DNA; Genomics; Nuclear magnetic resonance; Protein engineering; Protocols; Sequences; Statistical analysis; Binding Sites; DNA; DNA Replication; Protein Binding; Proteins; Reproducibility of Results;
Conference_Titel :
Engineering in Medicine and Biology Society, 2007. EMBS 2007. 29th Annual International Conference of the IEEE
Conference_Location :
Lyon
Print_ISBN :
978-1-4244-0787-3
DOI :
10.1109/IEMBS.2007.4353467