مرکز منطقه ای اطلاع رساني علوم و فناوري - Protein sequence design based on the topology of the native state structure

Abstract :

Computational design of sequences for a given structure is generally studied by exhaustively enumerating the sequence space or by searching in such a large space, which is prohibitively expensive. However, we point out that the protein topology has a wealth of information, which can be exploited to design sequences for a chosen structure. In this paper, we present a computationally efficient method for ranking the residue sites in a given native-state structure, which enables us to design sequences for a chosen structure. The premise for the method is that the topology of the graph representing the energetically interacting neighbours in a protein plays an important role in the inverse-folding problem. While our previous work (which was also based on topology) used eigenspectral analysis of the adjacency matrix of interactions for ranking the residue sites in a given chain, here we use a simple but effective way of assigning weights to the nodes on the basis of secondary connections, along with primary connections. This indirectly accounts for the edge weight in the graph and removes degeneracy in the degree. The new scheme needs only a few multiplications and additions to compute the preferred ranking of the residue sites even for structures of real proteins of sizes of a few hundred amino acid residues. We use HP lattice model examples (for which exhaustive enumeration of sequences is practical) to validate our ranking approach in obtaining sequences of lowest energy for any H–P residue composition for a given native-state structure. Some examples of native structures of real proteins are also included. Quantitative comparison of the efficacy of the new scheme with the earlier schemes is made. The new scheme consistently performs better and with much lower computational cost. An optimization procedure is added to work with the new scheme in a few rare cases wherein the new scheme fails to provide the best sequence, an optimization procedure is added to work with the new scheme.