While the extraction of approximate structural features from protein chemical shifts has become almost routine, the extraction of precise structural features is not. In fact, the inherently complex geometric, dynamic and electronic dependencies of protein chemical shifts has made the calculation of precise chemical shifts from protein structures or the calculation of precise structures from chemical shifts a significant challenge for more than 40 years (Sternlicht and Wilson 1967). For the specific task of calculating chemical shifts from structure (i.e. protein chemical shift prediction), at least two different routes have emerged. One is based on using sequence/structure alignment against chemical shift databases (i.e. sequence-based methods) and the other is based on directly calculating chemical shifts from atomic coordinates (i.e. structure-based methods).
Chemical Shift Values Pdf Download
A more recent extension to standard sequence-based shift prediction methods is SPARTA (Shen and Bax 2007). Rather than looking for global similarity, as is done with SHIFTY, SPARTA assesses similarity over a much smaller sequence range (just three residues). To predict chemical shifts for a given query protein, each tripeptide in the query structure is searched against the SPARTA tripeptide database and scored on the basis of its sequence and torsion angle (ϕ, ψ, and χ1) similarity. This information is combined with additional structural information (H-bond effects and ring current effects) to calculate a final set of chemical shifts. SPARTA and its successor SPARTA+ (Shen and Bax 2010), have proven to be remarkably accurate, especially for predicting 13C and 15N backbone shifts.
All the aforementioned methods are capable of predicting protein chemical shifts with reasonably high accuracy. As a rule, SHIFTX, SHIFTY, CamShift and SPARTA generally perform better than PROSHIFT, SHIFTS, SHIFTCALC and CheSHIFT. Nevertheless, it appears that sequence-based approaches, under certain circumstances, perform better than structure-based approaches, and vice versa. This suggests that by combining the strengths of both approaches, it may be possible to produce a hybrid method that exceeds the performance of any single sequence-based or structure-based method. Here we describe just such a hybrid method, called SHIFTX2. In particular, SHIFTX2 combines many of the structure-based concepts originally introduced in SHIFTX (Neal et al. 2003) with the sequence-based concepts introduced with SHIFTY (Wishart et al. 1997). By making use of a much larger and higher quality training set in combination with a number of other enhancements (using advanced machine learning techniques, employing more structural parameters) the performance of the structure-based component (now called SHIFTX+) was substantially improved. Likewise by using an improved sequence/shift database and by making use of local, instead of global, sequence alignment techniques we were also able to make substantial improvements to the performance of the sequence-based component (now called SHIFTY+). By carefully combining the algorithms for SHIFTX+ and SHIFTY+ we were able to create the hybrid program called SHIFTX2.
Key to the development of accurate chemical shift predictors is the creation of high quality chemical shift databases. For sequence-based methods it is necessary to develop a large and accurate database of protein sequences and properly referenced protein assignments. For structure-based methods it is critical to develop a large and accurate database of protein structures with correspondingly accurate and comprehensive chemical shift assignments. In developing the database for our sequence-based method (SHIFTY+) we used the chemical shift assignments from RefDB (Zhang et al. 2003). RefDB, which is updated weekly, currently contains 1903 re-referenced protein assignments that are automatically extracted and processed from the BioMagResBank (Seavey et al. 1991).
To properly combine output from SHIFTX+ and SHIFTY+, we compared their relative performance using various sequence identity cut-offs. It was determined that using a 40% (or above) sequence identity cut-off for SHIFTY+ consistently generated more accurate predictions than SHIFTX+. Therefore in the combined SHIFTX2 program, any SHIFTY+ prediction derived from a homologue having >40% sequence identity is combined with any shift predictions from SHIFTX+. Below this sequence cutoff, no SHIFTY+ data is used in making a chemical shift prediction. SHIFTX2 combines the predictions of SHIFTX+ and SHIFTY+ according to the magnitude of the atom-by-atom difference between their predictions. When the difference is sufficiently small, SHIFTY+ overrules SHIFTX+; otherwise the predictions are combined in a simple linear fashion with increasing weight for SHIFTX+ predictions as the difference grows. This combination rule is given by the following equations:
where \( \sigma_\Updelta \delta \) is the standard deviation (calculated using the SHIFTX+ training dataset) of the observed secondary chemical shift for a given atom type; d represents the difference between SHIFTX+ and SHIFTY+ predictions versus the standard deviation; and SDmin and SDmax are two parameters controlling the weight w we assign to the SHIFTX+ predictions. We experimented with various values of SDmin and SDmax ranging from 0.5 to 5 in increments of 0.5. From these tests we found that the best prediction results were achieved with SDmin = 0.5 and SDmax = 1.5. The resulting blended program (SHIFTX2) is able to function much like a structure-based chemical shift predictor. Hence when a protein structure is completely unfolded, SHIFTX2 biases itself towards SHIFTX+ predictions (large differences between SHIFTX+ and SHIFTY+ predictions); whereas when the protein is near its native structure, SHIFTX2 biases itself towards using SHIFTY+ predictions (small differences between SHIFTX+ and SHIFTY+ predictions).
SHIFTX2 was written in C, Java and Python is available as a standalone program, as an online web server and as a VMWare version. All of these versions are available at SHIFTX2 has been compiled and tested on Ubuntu Linux 10.04LTS; however, if properly configured, the SHIFTX2 program should run under most UNIX-like environments including Debian/GNU and Mandriva Linux, openSUSE, OpenSolaris, OpenBSD and Mac OS X. Despite having many more computationally intensive components than the original SHIFTY or SHIFTX programs, a number of code optimizations were also implemented to make SHIFTX2 sufficiently fast so that it could be used in chemical shift refinement or incorporated into chemical-shift-based structure generation programs such as CS23D (Wishart et al. 2008), CSRosetta (Shen et al. 2008) or GeNMR (Berjanskii et al. 2009) without any loss in speed.
To compare the performance of SHIFTX2 with other state-of-the-art shift predictors, we ran our test dataset of 61 proteins on six publicly available chemical shift prediction programs or web servers, including SHIFTS, SHIFTX, PROSHIFT, CamShift, SPARTA and SPARTA+. All seven programs were evaluated on the basis of: (1) their correlation coefficients (between observed and predicted shifts); (2) their root mean square deviation (RMSD); (3) their coverage (proportion of proteins or residues in the test set that were predicted); (4) their comprehensiveness (number of atoms or atom types predicted); and (5) their speed (CPU seconds or processing time to return an answer).
Bar graphs showing the correlation coefficients (a) and RMSD (b) between the observed and predicted backbone chemical shifts as measured for seven different chemical shift prediction programs using a standard test set of 61 proteins
While most state-of-the-art protein chemical shift predictors predict backbone chemical shifts, only four attempt to predict a subset of side chain shifts (SHIFTX, SHIFTX2, SHIFTS and PROSHIFT) and only two attempt to predict all possible side chain shifts (SHIFTX2 and PROSHIFT). Given the enormous amount of structural information contained in side chain chemical shifts (especially with respect to the influence of ring current effects and other long-range effects) it is surprising that more effort is not directed towards studying this class of chemical shifts. Indeed, ignoring side chain chemical shifts for proteins is a bit like ignoring side chain NOEs. Certainly most protein structures could not be solved or at least solved accurately without the inclusion of side chain NOEs. Similarly any effort directed at refining or solving protein structures using only backbone chemical shifts would no doubt lead to somewhat middling or ambiguous results.
In addition to comparing or assessing the accuracy (via correlation and RMSD) of these different chemical shift predictors, it is also important to assess their coverage (proportion of proteins or residues that could be predicted), their comprehensiveness (number of atoms or atom types predicted) and their speed (CPU seconds or processing time to return an answer). Somewhat surprisingly we found that a number of popular programs were unable to make predictions for a significant number of residues or protein structures (Table 3). For example, SHIFTS typically makes no predictions for about 10% of backbone 1H atoms and 18% of side chain 1H atoms. CamShift makes no predictions for about 5% of backbone atoms while SPARTA and SPARTA+ make no predictions for about 2 and 0.03% of backbone atoms, respectively. Given the variability in PDB file structures and the difficulty in writing robust PDB file parsers, a small percentage of file reading errors is not entirely unexpected. In other cases, it appears that the programs were specifically designed to ignore certain residues or atom types. Table 3 describes the chemical shift coverage, both in terms of the number of shifts and the number of proteins that could be analyzed by each of the seven programs used in this study. As seen in this table, only SHIFTX and SHIFTX2 achieve near 100% coverage. Note that for the performance comparisons given in Tables 1 and 2, we used only the atoms, residues and/or proteins in the 61-protein test set where all seven programs were able to calculate a chemical shift. Certainly if the unpredicted (i.e. null) shifts were included in the calculations shown in Tables 1 and 2 then the relative performance of SHIFTX2 against most other programs would be somewhat better than reported. 2ff7e9595c
Comments