Title of article :
Mining the Structural Genomics Pipeline: Identification of Protein Properties that Affect High-throughput Experimental Analysis
Author/Authors :
Chern-Sing Goh، نويسنده , , Ning Lan، نويسنده , , Shawn M Douglas، نويسنده , , Baolin Wu، نويسنده , , Nathaniel Echols، نويسنده , , Andrew Smith، نويسنده , , Duncan Milburn، نويسنده , , Gaetano T. Montelione and Ann M. Stock، نويسنده , , Baolin Wu&Hongyu Zhao، نويسنده , , Mark Gerstein، نويسنده ,
Issue Information :
روزنامه با شماره پیاپی سال 2004
Pages :
16
From page :
115
To page :
130
Abstract :
Structural genomics projects represent major undertakings that will change our understanding of proteins. They generate unique datasets that, for the first time, present a standardized view of proteins in terms of their physical and chemical properties. By analyzing these datasets here, we are able to discover correlations between a proteinʹs characteristics and its progress through each stage of the structural genomics pipeline, from cloning, expression, purification, and ultimately to structural determination. First, we use tree-based analyses (decision trees and random forest algorithms) to discover the most significant protein features that influence a proteinʹs amenability to high-throughput experimentation. Based on this, we identify potential bottlenecks in various stages of the structural genomics process through specialized “pipeline schematics”. We find that the properties of a protein that are most significant are: (i) whether it is conserved across many organisms; (ii) the percentage composition of charged residues; (iii) the occurrence of hydrophobic patches; (iv) the number of binding partners it has; and (v) its length. Conversely, a number of other properties that might have been thought to be important, such as nuclear localization signals, are not significant. Thus, using our tree-based analyses, we are able to identify combinations of features that best differentiate the small group of proteins for which a structure has been determined from all the currently selected targets. This information may prove useful in optimizing high-throughput experimentation. Further information is available from .
Keywords :
Charged residues , COGS , Hydrophobicity , decision trees , structural genomics
Journal title :
Journal of Molecular Biology
Serial Year :
2004
Journal title :
Journal of Molecular Biology
Record number :
1243353
Link To Document :
بازگشت