wwPDB: NMR validation report user guide

last updated: 07 October 2020

The wwPDB NMR validation reports are prepared according to the recommendations of the wwPDB NMR Validation Task Force (VTF; Montelione et al., 2013) and reuse common elements from the equivalent X-ray reports (Read et al., 2011; Gore et al., 2012); Gore et al., 2017). The NMR reports summarise the quality of the structure and highlight specific concerns by considering the atomic model and the chemical shift data. Analysis of the NMR restraints will be added in the future.

The title page shows some information about the entry deposition as well as the names and version numbers of the software tools and reference information used to produce the report. The title page will also show the type of the report (whether it is prelimary, confidential or for a publically released PDB/EMDB entry) and its length, for more details see FAQ on report types.

1. Overall quality at a glance

This section provides a succinct "executive" summary of key quality indicators. If there should be serious issues with a structure, this would usually be evident from this summary.

The metrics shown in the "slider" graphic (see example below) compare several important global quality indicators for this structure with those of previously deposited PDB entries. The comparison is carried out by calculation of the percentile rank, i.e. the percentage of entries that are equal or poorer than this structure in terms of a quality indicator. The global percentile ranks (black vertical boxes) are calculated with respect to all structures available in the PDB archive up to 27 December 2017. The NMR-specific percentile ranks (white vertical boxes) are calculated with respect to NMR entries in the PDB. In general, one would of course like all sliders to lie to the far right in the blue areas (especially for recently determined structures, and in particular the NMR-specific sliders).

Image of sliders for NMR entry

Note that if you are not an expert you neither need to know what the various quality criteria measure nor whether the values for an entry are unusual or not. However, for increased understanding, below is a brief description of these key global quality indicators:

Clashscore	This score is derived from the number of pairs of atoms in the model that are unusually close to each other. It is calculated by MolProbity (Chen et al., 2010) and expressed as the number or such clashes per thousand atoms. Further information can be found in the Close contacts section of the report, as described below.
Ramachandran outliers	A residue is considered to be a Ramachandran plot outlier if the combination of its φ and ψ torsion angles is unusual, as assessed by MolProbity (Chen et al., 2010). The Ramachandran outlier score for an entry is calculated as the percentage of Ramachandran outliers with respect to the total number of residues in the entry for which the outlier assessment is available. Further information can be found in the Torsion angles, Protein backbone section of the report, as described below.
Sidechain outliers	Protein sidechains mostly adopt certain (combinations of) preferred torsion angle values (called rotamers or rotameric conformers), much like their backbone torsion angles (as assessed in the Ramachandran analysis). MolProbity considers the sidechain conformation of a residue to be an outlier if its set of torsion angles is not similar to any preferred combination. The sidechain outlier score is calculated as the percentage of residues with an unusual sidechain conformation with respect to the total number of residues for which the assessment is available.
RNA backbone	Like the protein backbone and sidechains, the RNA backbone also adopts certain sets of preferred torsion angle values. Based on statistical analysis of RNA chains in the PDB, MolProbity (Chen et al., 2010) assigns a score per nucleotide for the quality of its backbone. This metric is calculated as the average score of all nucleotides in the entry.

For more information about validation metrics, see Montelione et al. (2013) and the review by Kleywegt (2000).

The slider graph is followed by a table that shows the number of entries upon which the percentile rank calculations are based:

(image of metrics table for NMR)

The next table provides a graphical summary of the quality of all polymeric chains:

example of quality of chain table, NMR #1

There may be green, yellow, orange and red portions in the bar for each chain, indicating the fraction of residues that contain outliers for 0, 1, 2, ≥3 model-only validation criteria, respectively. A grey segment indicates residues present in the sample but not modelled in the final structure. Residues, which are not well-defined in the NMR ensemble are represented by a cyan segment (please see section 2 for details). The numeric value for each fraction is shown below the corresponding segment. Values <5% are indicated with a dot.

The Quality of chain chart shows the fraction of residues in each chain that are flagged as unusual according to the validation criteria used rather than where in the sequence this occurs (the plots are a kind of horizontal pie chart). The following section Residue-property plots provides a graphic showing where in the sequence the issues occur.

NMR Compounds with violations table

The last table (see example above), when present, lists compounds that were perhaps not modelled adequately. The Chirality and Geometry columns show the number of models in the NMR ensemble where the listed compound is modelled with at least 40% of chiral centres or bonds, angles, torsions and rings as outliers. Individual issues and other details are listed in sections 6.4-6.7.

2. Ensemble composition and analysis

This section is unique to NMR entries. Typically, NMR data is incomplete and often inconsistent, and thus the NMR structures include multiple models (collectively called an ensemble), which together fit the NMR data and may indicate variability within the structure. The NMR VTF (Montelione et al., 2013) recommends that the reports identify the well-defined residues in the protein structures as defined by the Cyrange software (Kirchner and Güntert, 2011). Nucleic acids are for now excluded from the analysis of well-defined regions. In some cases, these well-defined residues can be divided into distinct cores, which are internally rigid, but may exhibit some flexibility with respect to each other. The VTF also recommends that the NMR community adopts the medoid model, i.e., the model, which is most similar to all the other models (as measured by the backbone RMSD over the residues in each well-defined core), as the representative model of the NMR ensemble (see Montelione et al., 2013 and Tejero et al., 2013). The medoid model calculated on the basis of the largest core is chosen as the overall representative. The authors of the entry, however, can designate a different model as representative, based on other criteria. For the purpose of the wwPDB validation reports, the medoid representative is used if it was possible to identify it. Otherwise, the first model is chosen as the overall representative.

The following table summarises the residue ranges of each well-defined core:

Table showing well defined core ranges

An NMR ensemble can be divided into clusters such that models within each cluster are more similar to each other than between clusters. The analysis is performed over the well-defined regions of proteins by NmrClust software (Kelley et al., 1996). This clustering is summarised in the following table:

Table showing NMR clusters

3. Entry composition

This section summarises the number of unique molecules that are present in the entry, and how they have been modelled. Each unique molecule and its instances (chain id) are described in a table:

table showing entry composition for TROPONIN C

with the following columns:

Mol	The identifier of the molecule (for experts: this is the same as the "entity id" in the mmCIF file of the entry).
Chain	The instance identifier. If there is more than one model present in the entry, the chain is prefixed with a model number.
Residues	The number of residues in the molecule.
Atoms	This tabulates the counts of various element types in the molecule.
Trace	The number of residues in the molecule that have been modelled with a reduced set of atoms. Protein or nucleic acid chains may be modelled with only one or two atoms (e.g. Cα, Cβ, P, an atom in a sugar ring or nucleobase, etc.). Typically, such cases are observed when the experimental data is insufficient to confidently model all atoms.

In addition, each unique oligosaccharide molecule, if present, is represented with a 2D SNFG image (Tsuchiya et al., 2017).

A Ligand Of Interest (LOI) is a subject of author’s research. Ligands that are flagged by authors during deposition are labelled as LOI.

The Mol and Chain identifiers are also used in other tables in the report.

4. Residue-property plots

This section shows summary plots of quality information for protein, RNA and DNA molecules on a per-residue basis.

The first subsection (4.1) reports residue scores averaged over the ensemble of models, while the second subsection (4.2) shows the scores for the representative model: medoid, if it was determined as described above, or author-specified one otherwise. Full reports show such plots for each member of the NMR ensemble.

In each subsection, there are two graphics shown for each molecule. The first graphic is the same as that shown in section 1: the green, yellow, orange and red segments indicate the fraction of residues with 0, 1, 2 and 3 or more types of model-only quality criteria with outliers, respectively. Cyan section indicates the fraction of residues, which are classed as ill-defined, and grey represents residues which are present in the sample, but not modelled in the final structure.

The second graphic shows the sequence annotated by these criteria with outliers in model quality (see example graphic below). The colour-coding described above is used here too. Residues, which are ill-defined by the NMR ensemble are shown in cyan with a green, yellow, orange or red underline to indicate their modelling quality according to the above scheme. Consecutive stretches of residues for which no outliers were detected at all are not shown individually, but indicated by a green connector. Residues absent from the final model are shown in grey.

(example of NMR residue-property plot)

In general, the less red, orange, yellow and grey these plots contain, the better. It is important to realise that residues that are outliers on one or more model-validation criteria could be either errors in the model, or reflect genuine features of the structure. Careful analysis of the experimental data (chemical shifts, NOE peaks and restraints) is typically required to make the distinction. Outlier residues that are important for structure or function (e.g., enzymatic residues, interface residues, ligand-binding residues) should be inspected extra carefully (and addressed in a manuscript describing the structure).

The types of model-only quality criteria included in this analysis, and the software used for their calculation are:

bond length and angle outliers (MolProbity, Chen et al., 2010)
chirality outliers (Validation-pack, Feng et al.)
planarity outliers (Validation-pack, Feng et al.)
too-close contacts (MolProbity, Chen et al., 2010 and Validation-pack, Feng et al.)
protein backbone (Ramachandran) outliers (MolProbity, Chen et al., 2010)
protein sidechain torsion angle outliers (MolProbity, Chen et al., 2010)
RNA backbone torsion angle outliers (MolProbity, Chen et al., 2010)
RNA sugar pucker outliers (MolProbity, Chen et al., 2010)

Details of the outliers found for a residue can be found further down the report, in the Model quality section.

5. Refinement protocol and experimental data overview

This section reports what structure refinement methods were used, how the final NMR ensemble was selected, and which software was used for structure determination and refinement.

The overall statistics on chemical shift validation is summarised in the following table. For entries deposited after December 2010, the chemical shift files are taken from the PDB file distribution and are named as follows: <pdb_code>_cs.str. For older entries, the chemical shifts data are taken from the BMRB, if a mapping between the PDB and BMRB entries is available. In such cases, the BMRB ID is provided. For all other reports (e.g. those generated during deposition, annotation, etc), the user-supplied files are used. The table indicates if there were any issues when parsing (e.g., format compliance) or mapping (matching the structure) the chemical shifts data, and reports the overall completeness of assignment. These and other details are reported in Section 7.

NMR Refinement protocol and experimental data table

6. Model quality

Quality statistics in this section are calculated using standard compilations of covalent geometry parameters (Engh & Huber, 2001; Parkinson et al., 1996), tools in MolProbity (Chen et al., 2010), Validation-pack (Feng et al.) and the wwPDB chemical component dictionary (CCD). For proteins, all of the below criteria apply only to well-defined ranges of residues as explained in section 2 (Ensemble composition). Nucleic acids, other non-protein polymers and ligands are included in full.

6.1. Standard geometry

This section describes the quality of the covalent geometry for protein, DNA and RNA molecules in terms of bond lengths, bond angles, chirality and planarity for all models and provides averages over the ensemble. There are two tables providing a per-molecule summary and four tables that provide information on (some of) the outliers for each criterion (if any; otherwise the table is omitted).

Summary table for bond lengths and angles

Expected bond length and bond angle values (and standard deviations) for standard amino acids and nucleotides are available in a wwPDB compilation (wwPDB, 2012). The MolProbity Dangle program calculates Z-scores of bond length and bond angle values for each residue in the molecule relative to the expected values. (A Z score is generally defined as the difference between an observed value an expected or average value, divided by the standard deviations of the latter.)
The root-mean-square value of the Z-scores (RMSZ) of bond lengths (or angles) is calculated for individual residues and then averaged for each chain and over the whole molecule. RMSZ scores are expected to lie between 0 and 1. An additional averaging of the averages is then performed over the NMR ensemble. For NMR structures, geometry is usually tightly restrained and small values are expected. Individual bond lengths or angles with a Z-score greater than 5 or less than -5 merit inspection.

The bond/angle summary table:
NMR Summary table for bond lengths and angles
has the following columns:

Mol	The molecule identifier
Chain	The instance identifier.
Bond lengths	The RMSZ sub-column gives the Root Mean Squared Z score of all bond lengths analyzed. ^‡ The #\|Z\| >5 sub-column provides the number of bond lengths that have a Z-score > 5 or < -5 in comparison to the total number of bonds analyzed. ^† ^‡
Bond angles	The RMSZ sub-column gives the Root Mean Squared Z score of all bond angles analyzed. ^‡ The #\|Z\| >5 sub-column provides the number of bond angles that have a Z-score > 5 or < -5 in comparison to the total number of angles analyzed. ^† ^‡ :: ^† The percentage of outliers is listed in parentheses. ^‡ The figure after ± is the standard deviation of the value across the NMR ensemble.

Summary table for chirality and planarity

Deviations from expected chirality and planarity in the model are calculated by Validation-pack (Feng et al.).
Chiral centres for all compounds occurring in the PDB are described in the chemical component dictionary. Chirality can be assessed in a number of ways, including calculation of the chiral volume, e.g. for the Cα of amino acids this is 2.6 or -2.6 Å³ for L or D configurations, respectively. If the sign of the computed volume is incorrect, the handedness is wrong. If the absolute volume is less than 0.7Å³ , the chiral centre has been modelled as a planar moiety which is very likely to be erroneous. Chirality deviations are summarised per chain.
Three kinds of potential planarity deviations are assessed:
- Sidechain: Certain groups of atoms in protein sidechains and nucleotide bases are expected to be in the same plane. An atom"s deviation from planarity is calculated by fitting a plane through these atoms and then calculating distance of individual atom from the plane. Expected value of such distances have been pre-calculated from data analysis (wwPDB, 2012). If an atom is modelled to be more than six times farther than the pre-calculated value, the residue is flagged to have a sidechain planarity deviation.
- Peptide: A deviation is flagged if the omega torsion angle of a peptide group differs by more than 30° from the values expected for a proper cis or trans conformation (0° and 180°, respectively).
- Main chain: The N atom of an amino acid residue is expected to be in the same plane as the Cα, C, and O atoms of the previous residue. If it is out of plane by more than 10°, this is flagged as a planarity deviation.

Outlier listing detailed tables

Where outliers exist, up to five for each category are listed in a table in the Summary Report, whereas the Full Report will list all the outliers found. For bond lengths and bond angles, the worst outliers are reported.

All the different outlier tables have the following columns in common:

Mol	The molecule identifier.
Chain	The instance identifier
Res	The residue number. Where applicable, an insertion code and alternative conformation identifier are specified as well.
Type	The residue name.
Models (Total)	Total number of occurrences in the NMR ensemble for the given outlier.

The following columns are specific to the bond length and bond angle outlier tables:

Atoms	names of atoms involved in the bond or angle.
Z	The worst Z-score of the bond length or angle for a given outlier over the NMR ensemble.
Observed	The observed value of the bond length or angle. Observed: The observed value of the bond length or angle for the worst observed value over the ensemble.
Ideal	The ideal value of the bond length or angle.
Models (Worst)	The model number in the NMR ensemble for which the outlier has the worst Z-score.
	For example:

The following column is specific to the chirality outliers table:

Atom	The name of the atom that is asssessed to have an unusual chiralty (see above for details of chirality assessment)

The following column is specific to the planarity outliers table:

Group	The planarity deviation type, i.e. sidechain, main chain or peptide as described above.
	For example:

6.2. Too-close contacts

This section provides details about too-close contacts between pairs of atoms that are not bonded where there is an unfavorable steric overlaps of van der Waals shells (clashes).

All-atom contacts are calculated by the Reduce and Probe programs within MolProbity (Word et al., 1999; Chen et al., 2010). This method was developed to quantify the detailed non-covalent fit of atomic interactions within or between molecules (H-bonds, favorable van der Waals, and steric clashes). Since most such interactions involve H atoms on one or both sides, all hydrogens must be present or added (Reduce optimizes rotation of OH, SH, NH3, etc. within H-bond networks, but methyls stay staggered). At present, in order to ensure comparable scores between NMR and X-ray, hydrogen atoms are removed from the analysed NMR structure, and replaced by a different set placed by Reduce in idealised and optimized nuclear-H positions. All-atom unfavorable overlaps ≥0.4Å are then identified as clashes, using van der Waals radii tuned for the nuclear H positions suitable for NMR (rather than the electron-cloud H positions suitable for X-ray). Ill-defined regions of proteins are excluded from the analysis, thus if an atom from an ill-defined region is involved in a clash (even with an atom from a well-defined core), such a clash is not counted. MolProbity then calculates an all-atom clashscore, which is defined as the number of clashes per 1000 atoms (including hydrogens). Percentile scores of the clashscore are also computed, to allow assessment of how the structure compares to the rest of the archive.

Clashes are summarised in a table, for example:

(image NMR clash summary table)

The columns are labelled:

Mol	The molecule identifier
Chain	The instance identifier
Non-H	The number of non-hydrogen atoms modelled.
H(model)	The number of hydrogen atoms modelled.
H(added)	The number of hydrogen atoms added by MolProbity.
Clashes	The number of clashes in which the atoms in this instance of the molecule are involved, followed by a clashscore for the given chain. Both numbers are averaged over the ensemble.

If there are clashes a table with details will then be given:

(image nmr table showing individual clashes)

the table has the following columns:

Atom-1	The molecule identifier, instance identifier, residue number, residue name and atom name for the first atom. where applicable, the chain identifier is prefixed with model number and an alternative conformation identifier is shown as a suffix to the atom name.
Atom-2	Identifies the second atom in the clash.
Interatomic distance	The distance between Atom-1 and Atom-2 in Å.
Clash overlap	the "magnitude" of the clash is assessed by MolProbity. the MolProbity "magnitude" of a clash is defined as the difference between the observed interatomic distance and the sum of the van der Waals radii of the atoms involved (Chen et al., 2010). The radii used are tuned for use with nuclear H positions suited for NMR (rather than the electron-cloud H positions used for X-ray).
Models (Worst)	The model in which a given clash has the worst magnitudes.
Models (Total)	The total number of models in which the given clash occurs.

In a Summary Report up to five of the worst clashes are listed in the table, whereas in a Full Report all the clashes are listed.

Please see FAQs on: Why are there clashes reported between hydrogen atoms that are not present in the deposited model?. and What to do about reported clashes?

6.3. Torsion angles

6.3.1. Protein backbone

This section is populated if there are protein molecules present in the entry. The conformation of a protein backbone can be described by a pair of torsion angles (phi, psi) per residue (the remaining torsion angle, omega, is usually 180°). Ramachandran plots show the combinations of phi-psi values in a structure and typically compare these to a distribution of commonly observed values in high-resolution crystal structures. MolProbity’s Ramachandran plots are residue-type specific, derived from a high-quality subset of protein X-ray structures and divided into favoured, allowed and outlier regions. Favoured and allowed regions are defined to be the regions that include 98% and 99.95%, respectively, of the residues in the high-quality data (see (Chen et al., 2010). for more details).

This section contains a summary of analysis of the backbone torsion angles phi and psi by Molprobilty.

example of protein backbone NMR summary table
The summary table contains the following columns:

Mol	The molecule identifier
Chain	The instance identifier
Analysed	The first number here is the number of residues in the chain for which MolProbity output is available. The second number is the total number of residues in the chain. Phi and psi angles cannot be analysed for terminal residues, non-standard residues or for residues with incompletely modelled main chain.
Favoured, Allowed, Outliers	The number (and percentage) of residues in the favoured, allowed and outlier regions respectively, of the residue-specific phi-psi plots. These numbers are averaged over the NMR ensemble.
Percentiles	The percentile score based on the percentage of Ramachandran outliers in the chain. These are given relative to the whole archive (first value) and relative to NMR structures (second value). The colours around the percentile values correspond to the slider positions in the Overall quality section of the report, as described above

Where Ramachandran outliers exist, up to five outlier residues are listed in a table in the Summary Report, whereas the Full Report will list all the outliers found. The table is sorted by frequency of occurrence in the NMR ensemble.
example of protein backbone NMR outlier table
It has following columns:

Mol	The molecule identifier
Chain	The instance identifier
Res	The residue number
Type	The residue name
Models (Total)	How often in the NMR ensemble a given residue falls into disallowed/outlier region of the Ramachandran plot.

6.3.2. Protein sidechains

Protein sidechain conformation can be described by the chi torsion angles. Depending on residue type, these angles adopt certain preferred sets of values (also termed rotamers or rotameric conformers). Based on analysis of high quality X-ray entries in the PDB, MolProbity assesses whether a sidechain is similar to one of the preferred sets of torsion angles, or is an outlier (see (Chen et al., 2010). for more details). This section is based on MolProbity analysis of sidechains.

example of protein sidechain NMR summary table

The summary table summarises of sidechain outliers and has the following columns:

Mol	The molecule identifier
Chain	The instance identifier
Analysed	The first number here is the number of residues in the chain which were analysed by MolProbity. The second number is the total number of residues in the chain. Chi torsion angles cannot be analysed for non-standard residues or for residues with incompletely modelled sidechains.
Rotameric, Outliers	The number (and percentage) of residues with favoured, and unusual chi torsion angles respectively.
Percentiles	The absolute and relative percentile scores based on the percentage of sidechain outliers in the chain. These are given relative to the whole archive (first value) and relative to NMR structures (second value). The colours around the percentile values correspond to the slider positions in the Overall quality section of the report, as described above

Where outliers exist, up to five are listed in a table in the Summary Report, whereas the Full Report will list all the outliers found. The table is sorted by frequency of occurrence in the NMR ensemble.

example of protein sc NMR outlier table

It has the following columns:

Mol	The molecule identifier
Chain	The instance identifier
Res	The residue number Type: :The residue name
Models (Total)	How often in the NMR ensemble a given residue is deemed non-rotameric.

6.3.3. RNA

This section describes the quality of RNA chains using MolProbity’s analysis of ribose sugar puckers and rotameric nature of "suites" of backbone torsion angles (see Richardson et al., 2008, and Chen et al., 2010 for details). A suite consists of the torsion angles between the sugars in two RNA nucleotides and is identified by the 3' nucleotide.

example of NMR RNA summary table

The summary table summarises the geometrical quality of an RNA chain using the following columns:

Mol	The molecule identifier
Chain	The instance identifier
Analysed	The first number here is the number of backbone suites for which analysis was carried out, and the latter number is the total number of nucleotides. The former is a smaller number because a suite is not defined at 5'-end, or a suite might be incompletely modelled.
Backbone outliers	The percentage of nucleotide suites in the chain which Molprobitiy identified as an outlier. The value is averaged over the NMR ensemble.
Pucker outliers	The percentage of sugar pucker outliers in the chain which Molprobitiy identified as an outlier. These are nucleotides where the strong correlation between sugar pucker and distance between the glycosidic bond vector and the following phosphate is violated. The value is averaged over the NMR ensemble.
Suiteness	The overall suiteness parameter as defined by Molprobity. The value is averaged over the NMR ensemble.

Where backbone or pucker outliers exist, up to five are listed in a table in the Summary Report, whereas the Full Report will list all the outliers found. The table is sorted by frequency of occurrence in the NMR ensemble.

example of NMR RNA summary table

Both tables have the following columns:

Mol	The molecule identifier
Chain	The instance identifier
Res	The residue number.
Type	The residue name.
Models (Total)	How often in the NMR ensemble a given nucleotide is deemed a backbone or pucker outlier.

6.4 ⇒ 6.7. Non-standard residues in protein, DNA, RNA chains; Carbohydrates; Ligand geometry; Other polymers

These sections analyse the geometry of:

Non-standard amino acids within proteins and non-standard nucleotides within DNA or RNA
Carbohydrates
Ligands
Other polymers

Bond lengths, bond angles, acyclic torsions and isolated rings are assessed using the Mogul program (Bruno et al., 2004) by comparison with preferred molecular geometries derived from high-quality, small-molecule structures in the Cambridge Structural Database (CSD). Chirality is assessed by Validation-pack (Feng et al.).

There are three summary tables providing a per-molecule overview and detailed tables that provide information on (some of) the outliers for each criterion (if any; otherwise the table is omitted).

Summary tables for bond lengths and angles

A Z-score is calculated for each bond length and bond angle in the molecule (A Z-score is generally defined as the difference between an observed value and an expected or average value, divided by the standard deviations of the latter.). Individual bond lengths or angles with a Z-score less than -2 or greater than 2 merit inspection.

The root-mean-square value of the Z-scores (RMSZ) of bond lengths (or angles) is calculated for the whole molecule. RMSZ scores are expected to lie between 0 and 1. For low-resolution structures, geometry should be tightly restrained and small values are expected. For very high-resolution structures, values approaching 1 may be attained. Values greater than 1 indicate over-fitting of the data.

At least 20 examples were required for each bond length and bond angle to be assessed.

example of NMR ligand bond lenghts summary
example of NMR ligand bond angles summary

The bond/angle summary tables have the following columns:

Mol	The molecule identifier.
Type	The residue name.
Chain	The instance identifier.
Res	The residue number.
Link	The identifier(s) of the molecule(s) to which the residue is linked, e.g. by a covalent bond, salt bridge etc.
Bond lengths (or angles)	This column is subdivided into three: Counts: This column gives 3 values: the number of bonds (or angles) analysed, the number of bonds (or angles) modelled in the residue and the number of bonds (or angles) defined in the PDB chemical component dictionary. The number of bonds (or angles) analysed may be less than observed due to the absence of comparable fragments in the Cambridge Structural Database. RMSZ: The root-mean-square value of the Z-scores (RMSZ) of all bond lengths (or angles). The values are averaged over the NMR ensemble. #\|Z\| >2: The number of bond lengths or bond angles that have a Z-score of less than -2 or greater than 2 compared to the total number of bonds / angles that have sufficient matches in the CSD is given in the #\|Z\| >2 column. In parentheses the number of outliers within the molecule is listed as a percentage. Both values are averaged over the NMR ensemble.

Summary table for chirality, torsions and rings

For acyclic torsion angles, Mogul provides the local density measure. This measures the ratio of incidences in the Cambridge Structural Database within 10 degrees of the torsion angle in question, to the number of total incidences of the torsion angles in the Cambridge Structural Database. If this figure was less than 5% the torsion angle is considered an outlier.

For isolated rings, Mogul compares the given ring with comparable rings in small molecules structures in the Cambridge Structural Database and calculates an RMSD value based on corresponding constituent torsion angles for each comparable ring. The mean and minimum of these RMSDs both have to be above 60° for the ring to be flagged an outlier.

At least 15 examples were required for each torsion angle and ring to be assessed.

Note that the criteria used to flag a ring or torsion angle as an outlier are under development. The current criteria are very conservative. They will be refined following analysis of a large test set of ligands.

Mogul chirality, torsions and rings summary table for NMR

The chirality, torsion angles and rings summary table contains the following columns:

Mol	The molecule identifier.
Type	The residue name.
Chain	The instance identifier.
Res	The residue number.
Link	One or more molecule identifiers to which the residue is linked, e.g. by a covalent bond, salt bridge etc.
Chirals	This column lists: the number of chiral outliers in the chain, the number of chiral centers analysed, the number of these observed in coordinates and the number defined in the PDB chemical component dictionary.
Torsion	This column lists: the number of torsion angle outliers in the chain, the number of torsions analysed, the number of these observed in coordinates and the number defined in the PDB chemical component dictionary.
Rings	This column lists: the number of ring outliers in the chain, the number of rings analysed, the number of these observed in coordinates and the number defined in the PDB chemical component dictionary.

Information tables for bond length, bond angle, chirality, torsion angle and ring outliers

Where outliers exist, up to five for each category are listed in a table in the Summary report, while the Full report lists all of them. Bond length and bond angle outliers are sorted by the Z-score of the worst instance in the NMR ensemble. Other outliers are sorted by the frequency of occurence in the NMR ensemble.

The outlier tables have the following columns in common:

Mol	The molecule identifier.
Type	The residue name.
Atom(s)	names of atoms involved in the bond, angle, torsion angle, ring, or the name of the chiral atom with the unusual deviation.
Chain	The instance identifier.
Res	The residue number.
Models (Total)	How often in the NMR ensemble a given outlier occurs.

The following columns are specific to the bond length and bond angle outliers tables:

Z	The difference between observed and ideal values in terms of standard deviations.
Observed	The observed value of the bond length or angle.
Ideal	The ideal value of the bond length or angle.
Models (Worst)	The model in the NMR ensemble exhibiting the worst Z-score for a given outlier.
	For example:

     Bond length outlier table:
     Mogul bond length outlier table for NMR
     Bond angle outlier table:
     Mogul bond angle outlier table for NMR

The two-dimensional graphical depiction (Smart and Bricogne, 2015) of Mogul quality analysis of bond lengths, bond angles, torsion angles, and ring geometry are provided for ligands that have been designated as ligand of interest (LOI) by the depositor, regardless of the validation assessment, and for any ligands with molecular weight greater than 250 Daltons that have outliers flagged in validation.

Color scheme is coded according to validation result with green indicating commonly observed values, magenta indicating unusual values, and gray indicating that there was insufficient data to derive a validation score. Unusual values include model quality and electron density fit. For model quality, individual bond lengths or angles with a Z-score less than -2 or greater than 2, the torsion angle with less than 5% of local density measure from Mogul calculation, or RMSD is above 60 degree are considered unusual and colored in magenta.

6.8. Polymer linkage issues

Any chain breaks are identified in this section. It is unusual for NMR entries.

7. Chemical shifts validation

This section is unique to NMR validation reports. It is split into subsections, one per distinct list of assigned chemical shifts. Typically, assigned chemical shifts are grouped into separate lists based on experimental conditions (e.g., different pH values). However sometimes, and especially prior to annotation, other criteria may be used by the depositors (such as assignments for distinct molecular components forming a complex are split into individual lists). Each subsection (7.X, where X is the list number) follows the same structure.

7.X.1. Bookkeeping

Basic information about each chemical shifts list, such as the file name, title given to the list, and a table with the results of parsing the chemical shifts and mapping chemical shifts to atoms in the structure. The last row of the table gives the number of statistically unusual chemical shift values, as determined by the Validation pack (Feng et al.) software. The validation report will highlight any issues with nomenclature checks between the coordinate and chemical shift files. Any such issues with mapping or parsing (e.g., differing residue identities between chemical shift and structure data) are listed in subsequent separate tables. For publicly released entries, the chemical shifts data are taken from the PDB file distribution (file names of the following form: <pdb_code>_cs.str) or from the BMRB if the mapping between PDB and BMRB entries is available (BMRB ID is given in these cases).

NMR shifts table

Should there be any issues with parsing the data or with mapping between chemical shifts and structure, separate tables listing the issues would appear below. In practice, mapping and parsing issues are only expected in preliminary reports. Depositors are encouraged to address such issues prior to deposition.

In the example below, the chemical shift values need to be corrected as they are not numeric.

NMR shifts value must be a number … table

7.X.2. Chemical shift referencing

PANAV software (Wang et al, 2010) is used to calculate the suggested referencing corrections for Cα, Cβ, C' and N nuclei. The standard error of the correction is estimated by jackknifing, i.e., running the software 10 times, each time omitting 10% of the data. In practice, suggested corrections can be ignored if they are too small (below 0.5 ppm) or too imprecise (below 2 standard errors). Significant difference between the calculated corrections for Cα and Cβ nuclei is also of concern.

NMR referencing corrections table

7.X.3. Completeness of the chemical shifts assignment

The following table reports completeness of chemical shifts assignment for different groups and different nuclei. Only assignable nuclei are taken into consideration (see Montelione et al., 2013, for details). For proteins, backbone is considered as C';, N, HN, and Hα*, while sidechains include Cβ, Hβ* and other aliphatic carbons, hydrogens and nitrogens in Asn, Gln, Lys and Arg side chains. The aromatic group includes all nuclei in aromatic rings. Completeness is calculated with respect to the entire well-defined structure. Thus for multimeric structures with multiple chemical shift lists, the completeness values may need to be added up to arrive at the final measure. Note that hydrogens are seldom assigned chemical shifts in solid-state NMR. The complete report contains a second table reporting completeness of resonance assignments for the entire molecular assembly, i.e., including well- and ill-defined regions.

NMR Completeness of the chemical shifts assignment table

7.X.4. Statistically unusual chemical shifts

Generally, statistically unusual chemical shift values can be classified into real outliers and artefactual ones. Real outliers can be caused by an unusual strained conformation, presence of strong additional magnetic fields (e.g., caused by aromatic rings) or because of paramagnetic phenomena. Such cases are of course not errors, and may in fact point to interesting features of the structure. Artefactual outliers, on the other hand are caused by interpretation errors (e.g., swapped assignments), processing errors (e.g., incorrect spectrum), failure to compensate for spectral aliasing/folding or referencing offsets. Such cases can often be easily corrected prior to deposition.

NMR Statistically unusual chemical shifts table

The table columns are:

Mol	The molecule identifier.
Chain	The instance identifier.
Res	The residue number.
Type	The residue name.
Atom	The nucleus with an usual value
Shift	The measured chemical shift value in ppm
Expected range	The range (in ppm) where the chemical shift value for a given nucleus is typically observed in the BMRB (±5 standard deviations from mean value)
Z-score	The difference from the mean value measured in standard deviations

7.X.5. Random Coil Index (RCI) plots

The random coil index (RCI) is calculated for each protein residue by the RCI software (Berjanskii & Wishart, 2005) based on the measured chemical shifts and on the primary sequence of the protein chain. The higher the bar in the graph, the higher the probability that the given residue is disordered ("random coil-like"). The colour of each bar indicates whether the residue is classified as well-defined (black) or ill-defined (cyan) by the Cyrange software (Kirchner and Güntert, 2011), as described in Section 2 on ensemble composition. Therefore, ideally residues indicated by black bars should should show low RCI values (e.g., RCI value of 0.02 corresponds to an order parameter S2 of approximately 0.9. For further analysis of meaning of RCI, see Berjanskii & Wishart, 2005 and Berjanskii & Wishart, 2008.

NMR RCI plot

8. NMR restraints analysis

NMR restraints analysis is performed using the BMRB Restraints Analysis package, which was developed by BMRB according to the recommendations of the NMR-VTF (Montelione et al., 2013). This package will provide a summary of uploaded restraints, NOE distance violation statistics, and dihedral-angle violation statistics.

8.1. Conformationally restricting restraints

This section provides the summary of the NMR restraints deposited in either NEF or NMR-STAR format as a unified NMR data upload. Restraints are filtered for duplicate and redundant restraints and then counted in different categories.

Duplicate restraints: Restraints between atoms A and B and between B and A are considered as duplicate restraints.

Redundant restraints: Restraints involving atoms which are already restrained by covalent structure are considered as redundant. The BMRB restraints analysis package uses the pre-generated list from PDBStat(Tejero et al., 2014) to filter out the redundant restraints

Restraints are classified into different categories based on the sequence separation of the atoms involved and the type of the restraints like hydrogen bonds and disulfied bonds. The table lists the number of conformationally restricting restraints in different categories. Hydrogen and disulfide bond restraints are identified using the atoms involved in defining the restraints and counted separately. The full sequence length is used to estimate the number of restraints per residue. The number of restraints per residue includes all kinds of restraints. If an atom involved in a restraint has no corresponding atom in the coordinate file, then it is counted as an unmapped restraint. Conflicting IUPAC/non-IUPAC atom names between the restraint and coordinate files will also lead to unmapped restraints.

NMR restraints summary table

Long range hydrogen bonds and disulfide bonds are counted as long range restraints while calculating the number of long range restraints per residue.

8.2. Residual restraint violations.

All conformationally restricting restraints are validated against each model in the ensemble. If the measured distance between a pair of atoms in a given model lies between the upper and the lower bound of the corresponding restraint, then the restraint is not violated. If the measured distance in a model lies outside of the boundaries defined by the restraint, then the absolute difference between the measured value and the nearest boundary is reported as the violation value. For restraints defined between groups of atoms (e.g. methyl groups), the 1/r^6 sum(M. Nilges 1995) is used to calculate the effective distance. Violation values less than 0.1 Å are excluded from the statistics as recommended by the NMR-VTF.

8.2.1. Average number of distance violations per model

Distance violations are binned as small, medium, and large violations based on the magnitude of their violation values. In each bin the average number of violations per model is calculated by dividing the total number of violations in each bin by the size of the ensemble. The maximum value of the violation in each bin is also reported.

Residual distance violations

8.2.2. Average number of dihedral-angle violations per model

A similar approach is taken for the dihedral angle violations, withthe average number of violations per model and the maximum angular violation reported for small, medium and large dihedral-angle violation bins

Residual angle violations

9. Distance violation analysis

This section summarizes the NOE distance restraints analysis and provides the detailed report on distance violations in each model and in different restraint categories

9.1. Summary of distance violations.

The table lists the number of violations in different restraint categories (intra-residue, sequential, medium range, long range and inter-chain,hydrogen bonds, disulfide bonds) and sub-categories (backbone to backbone, backbone to sidechain, sidechain to sidechain). In each category, the table provides the total number of restraints, its percentage with respect to total, number of violated restraints and its percentage with respect to that particular category and with respect to the total number of restraints. Restraints that are violated in at least one model are counted as violated and restraints that are violated in all the models are counted as consistently violated.

Distance violation summary

9.1.1. Bar chart: Distribution of distance restraints and violations

The bar chart is generated using the table in section 9.1. This provides graphical visualization of the information provided in section 9.1. The violated and consistently violated portions in each category are shown using different hatch patterns in the bar plot.

Distance violation summary plot

Violated and consistently violated restraints are shown using different hatch patterns in their respective categories. The hydrogen bonds and disulfied bonds are counted in their appropriate category on the x-axis

9.2. Distance violation statistics for each model

The table provides the number of violations in different categories for each model, along with the mean, median, standard deviation, and maximum value.

Distance violation in each model

9.2.1. Bar graph: Distance violation statistics for each model

The bar graph provides the graphical visualization for the data provided in section 9.2. Here the number of violations is plotted against the model ID and the different restraints categories are represented in different colors. The mean(dot), median(x) and standard deviation (error bar) are plotted in blue against the y axis on the right.

Distance violation in each model plot

9.3. Distance violation statistics for the ensemble

The distance information determined from NMR experiments relates to the time averaged distance between a given pair of atoms. The most comprehensive approach to validate this information is to therefore measure the average distance between the pair of atoms over the course of a representative molecular dynamics trajectory. In doing so, the inherent dynamics of biomolecules can be considered, and probabilistically whilst a large portion of the observable restraints may be violated for a small fraction of time, it should hold true that most restraints are satisfied for the majority of the time. Consequently, if we assume NMR ensembles to be a faithful representation of the dynamical nature of the biomolecules, then we would expect to observe a similar distribution of violations across the static models of the ensemble. If a restraint is violated in all of the models (consistently violated), then it therefore indicates a problem.

The table lists the number violated restraints for different sizes of the ensemble. This is calculated by asking How many restraints are violated in only one model? , How may are violated in only two models? … and so on and so forth. In this process, each restraint is counted once based on how many models it violates.

Distance violation in the ensemble

9.3.1. Bar graph: Distance violation statistics for the ensemble

The bar graph provides the graphical visualization of table 9.3. One would expect that as the fraction of the ensemble approaches to 100%, the number of violated restraints should go to zero or close to zero. Distance violation in the ensemble plot

9.4. Most violated distance restraints in the ensemble

This section summarizes the most violated restraints and the average violations in an ensemble. The key (restraint list ID, restraint ID) is used as a unique identifier for specific restraints. Restraints that involve a group of atoms, listed in multiple rows, will have the same key and will be counted as one restraint.

9.4.1. Histogram: Distribution of mean distance violations

For each restraint, the mean violation is calculated over the number of violated models in the ensemble. A histogram (0.05 Å bin size) is generated to show the distribution of mean violation distance for each restraint in the ensemble

Most violated histogram

9.4.2. Table: Most violated distance restraints

The table lists the number of violated models and the mean, median, and standard deviation of those violations for each distance restraint when violations are identified in more than one model. This list is ordered by number of violated models and the mean value. The summary PDF report will only list the 10 most violated restraints. The full list can be found in the full PDF version.

Most violated table

9.5. All distance violations

This section lists distance violations in every model. The key (restraint list ID, restraint ID) is used as a unique identifier. Restraints that involve a group of atoms, listed in multiple rows, will have the same key and will be counted as one restraint.

9.5.1. Histogram: Distribution of distance violations

A histogram (0.05 Å bin size) shows the distribution of individual violation values in the ensemble. All violations histogram

9.5.2. Table: All violated distance restraints

Every violation in each model will be listed here, in descending order of the distance violation value. The summary report will only list the 10 most violated restraints. The full list can be found in the full PDF version.

All violations table

10. Dihedral-angle violation analysis.

This section summarizes the analysis of the dihedral-angle restraints when provided from a NEF/NMR-STAR file as a unified NMR data upload.

10.1. Summary of dihedral-angle violations.

The dihedral-angle restraints are counted in different dihedral-angle types (e.g., PHI, PSI, etc..) and the number of violations and percentage of violation in each type are calculated. Restraints that are violated in at least one model are defined as violated and restraints that violated in all models are counted as consistently violated.

Dihedral-angle violation summary

10.1.1. Bar chart: Distribution of dihedral-angles and violations.

The bar chart is generated using the table in section 10.1. This provides graphical visualization of the information provided in section 10.1. The violated and consistently violated portions in each angle type are shown using different hatch pattern in the bar plot.

Dihedral-angle violation summary plot

10.2. Dihedral-angle violation statistics for each model

The table provides the number of violations in different dihedral-angle types for each model, along with the mean, median, standard deviation, and the maximum violation value.

Dihedral-angle violation in each model

10.2.1. Bar graph: Dihedral-angle violation statistics for each model

The bar graph provides the graphical visualization for the data provided in section 10.2. Here the number of violations is plotted against the model ID and the different dihedral-angle types are represented in different colors. The mean(dot), median(x), and standard deviation (error bar) are plotted in blue against the y axis on the right.

Dihedral-angle violation in each model plot

10.3. Dihedral-angle violation statistics for the ensemble.

Dihedral-angle restraints are typically derived from chemical shifts and/or J-couplings. Because these measurements involve time-averaged observables, a comprehensive validation of angular restraints would ideally involve analysis of the system over the duration of a representative molecular dynamics trajectory. Therefore, a similar approach as described in section 9.3 is used here and the number of violated restraints is calculated for different sizes of the ensemble. If we assume the NMR ensemble is a faithful representation of the dynamic nature of the biomolecules, then we would expect to observe a large number of restraints violated in a small fraction of the ensemble and little to no restraints violated in the majority of the ensemble. If a restraint is violated in all of the models (consistently violated), then it indicates a problem.

Dihedral-angle violation in the ensemble

10.3.1. Bar graph: Dihedral angle violation statistics for the ensemble

The bar graph provides the graphical visualization of table 9.3.

Dihedral-angle violation in the ensemble plot

10.4. 10.4 Most violated dihedral-angle restraints in the ensemble.

This section summarizes the most violated restraints and the average violations in an ensemble. The key (restraint list ID, restraint ID) is used as a unique identifier for restraints.

10.4.1. Histogram: Distribution of dihedral angle restraints violations

For each restraint, the mean violation is calculated over the number of violated models in the ensemble. A histogram (2o bin size) is generated to show the distribution of mean angular violation for each dihedral angle restraint in the ensemble.

Most violated histogram

10.4.2. Table: Most violated dihedral angle restraints

The table lists the number of violated models, and the mean, median and standard deviation of those violations for each dihedral-angle restraint having violations are identified in more than one model. This list is ordered by number of violated models and the mean value. The summary PDF report will only list the 10 most violated restraints. The full list can be found in the full PDF version.

Most violated table

10.5. All dihedral-angle violations

This section lists dihedral-angle violations in every model. The key (restraint list ID, restraint ID) is used as a unique identifier for restraints.

10.5.1. Histogram: Distribution of dihedral-angle violations

A histogram (2o bin size) shows the distribution of individual dihedral-angle violation values in the ensemble.

All violations histogram

10.5.2. Table: All dihedral-angle violations

Every violation in each model will be listed here, in descending order of the dihedral-angle violation value. The summary report will list only top 10 violations whereas the full PDF report lists all the violations. All violations table

References

M. V. Berjanskii and D. S. Wishart. A Simple Method To Predict Protein Flexibility Using Secondary Chemical Shifts. J. Am. Chem. Soc., 127:14970–14971, 2005. CrossRef
M. V. Berjanskii and D. S. Wishart. Application of the random coil index to studying protein flexibility. J. Biomol. NMR, 40:31–48, 2008. CrossRef
I. J. Bruno, J. C. Cole, M. Kessler, J. Luo, W. D. S. Motherwell, L. H. Purkis, B. R. Smith, R. Taylor, R. I. Cooper, S. E. Harris, and A. G. Orpen. Retrieval of crystallographically-derived molecular geometry information. J. Chem. Inf. Comput. Sci., 44:2133–2144, 2004. CrossRef
V. B. Chen, W. B. Arendall III, J. J. Headd, D. A. Keedy, R. M. Immormino, G. J. Kapral, L. W. Murray, J. S. Richardson, and D. C. Richardson. MolProbity: all-atom structure validation for macromolecular crystallography. Acta Cryst., D66:12–21, 2010. CrossRef
R. A. Engh and R. Huber. International Tables for Crystallography, Volume F. Crystallography of Biological Macromolecules., Chapter 18.3 Structure quality and target parameters, pages 382–392. Kluwer Academic Publishers, 2001. CrossRef
Z. Feng. Validation-pack. https://sw-tools.pdb.org/
S. Gore, S. Velankar and G. J. Kleywegt. Implementing an X-ray validation pipeline for the Protein Data Bank. Acta Cryst., D68:478–483, 2012. CrossRef
S. Gore, E. S. Garcia, P. M. S. Hendrickx, A. Gutmanas, J. D. Westbrook, H. W. Yang, Z. K. Feng, K. Baskaran, J. M. Berrisford, B. P. Hudson, Y. Ikegawa, N. Kobayashi, C. L. Lawson, S. Mading, L. Mak, A. Mukhopadhyay, T. J. Oldfield, A. Patwardhan, E. Peisach, G. Sahni, M. R. Sekharan, S. Sen, C. H. Shao, O. S. Smart, E. L. Ulrich, R. Yamashita, M. Quesada, J. Y. Young, H. Nakamura, J. L. Markley, H. M. Berman, S. K. Burley, S. Velankar, G. J. Kleywegt. Validation of Structures in the Protein Data Bank. Structure 25: 1916-1927, 2017. CrossRef)
L. A. Kelley, S. P. Gardner and M. J. Sutcliffe. An Automated Approach For Clustering An Ensemble Of NMR-Derived Protein Structures Into Conformationally-Related Subfamilies. Protein Eng. 11:1063–1065, 1996. CrossRef
D. K. Kirchner and P. Güntert. Objective identification of residue ranges for the superposition of protein structures. BMC Bioinformatics, 12:170–180, 2011. CrossRef
G. T. Montelione, M. Nilges, A. Bax, P. Güntert, T. Herrmann, J. S. Richardson, C. D. Schwieters, W. F. Vranken, G. W. Vuister, D. S. Wishart, H. M. Berman, G. J. Kleywegt, and J. L. Markley. Recommendations of the wwPDB NMR Validation Task Force. Structure, 21:1563–1570, 2013. CrossRef
G. N. Parkinson, J. Vojtechovsky, L. Clowney, A. T. Brünger, and H. M. Berman. New parameters for the refinement of nucleic acid containing structures. Acta Cryst., D52:57–64, 1996. CrossRef
R. J. Read, P. D. Adams, W. B. Arendall III, A. T. Brunger, P. Emsley, R. P. Joosten, G. J. Kleywegt, E. B. Krissinel, T. Lütteke, Z. Otwinowski, A. Perrakis, J. S. Richardson, W. H. Sheffler, J. L. Smith, I. J. Tickle, G. Vriend, and P. H. Zwart A new generation of crystallographic validation tools for the Protein Data Bank. Structure, 19:1395–1412, 2011. CrossRef
J. S. Richardson, B. Schneider, L. W. Murray, G. J. Kapral, R. M. Immormino, J. J. Headd, D. C. Richardson, D. Ham, E. Hershkovits, L. D. Williams, K. S. Keating, A. M. Pyle, D. Micallef, J. Westbrook and H. M. Berman. RNA backbone: consensus all-angle conformers and modular string nomenclature (an RNA Ontology Consortium contribution). RNA. 14:465–481, 2008. CrossRef
O. S. Smart, and G. Bricogne. Multifaceted Roles of Crystallography in Modern Drug Discovery (G. Scapin, D. Patel and E. Arnold eds.), Achieving High Quality Ligand Chemistry in Protein-Ligand Crystal Structures for Drug Design, pages 165–181. Springer Netherlands, Dordrecht, 2015. https://www.globalphasing.com/buster/wiki/index.cgi?BusterReport
R. Tejero, D. Snyder, B. Mao, J. M. Aramini, G. T. Montelione. PDBStat: A universal restraint converter and restraint analysis software package for protein NMR. J. Biomol. NMR, 56:337–351, 2013. CrossRef
B. Wang, Y. Wang and D. S. Wishart. A probabilistic approach for validating protein NMR chemical shift assignments. J. Biomol. NMR, 47:85–99, 2010. CrossRef
S. Tsuchiya, N. P. Aoki, D. Shinmachi, M. Matsubara, I. Yamada, K. F. Aoki-Kinoshita and H. Narimatsu. Implementation of GlycanBuilder to draw a wide variety of ambiguous glycans. Carbohydrate Res. 445:104–116, 2017. CrossRef
J. M. Word, S. C. Lovell, T. H. LaBean, H. C. Taylor, M. E. Zalis, B. K. Presley, J. S. Richardson, D. C. Richardson. Visualizing and quantifying molecular goodness-of-fit: small-probe contact dots with explicit hydrogen atoms J Mol. Biol., 285:1711–1733, 1999. CrossRef
wwPDB. The standard geometry compilation used in wwPDB validation protocols, 2012.

User guide to the wwPDB NMR validation reports

1. Overall quality at a glance

2. Ensemble composition and analysis

3. Entry composition

4. Residue-property plots

5. Refinement protocol and experimental data overview

6. Model quality

6.1. Standard geometry

6.2. Too-close contacts

6.3. Torsion angles

6.3.1. Protein backbone

6.3.2. Protein sidechains

6.3.3. RNA

6.4 ⇒ 6.7. Non-standard residues in protein, DNA, RNA chains; Carbohydrates; Ligand geometry; Other polymers

6.8. Polymer linkage issues

7. Chemical shifts validation

7.X.1. Bookkeeping

7.X.2. Chemical shift referencing

7.X.3. Completeness of the chemical shifts assignment

7.X.4. Statistically unusual chemical shifts

7.X.5. Random Coil Index (RCI) plots

8. NMR restraints analysis

8.1. Conformationally restricting restraints

8.2. Residual restraint violations.

8.2.1. Average number of distance violations per model

8.2.2. Average number of dihedral-angle violations per model

9. Distance violation analysis

9.1. Summary of distance violations.

9.1.1. Bar chart: Distribution of distance restraints and violations

9.2. Distance violation statistics for each model

9.2.1. Bar graph: Distance violation statistics for each model

9.3. Distance violation statistics for the ensemble

9.3.1. Bar graph: Distance violation statistics for the ensemble

9.4. Most violated distance restraints in the ensemble

9.4.1. Histogram: Distribution of mean distance violations

9.4.2. Table: Most violated distance restraints

9.5. All distance violations

9.5.1. Histogram: Distribution of distance violations

9.5.2. Table: All violated distance restraints

10. Dihedral-angle violation analysis.

10.1. Summary of dihedral-angle violations.

10.1.1. Bar chart: Distribution of dihedral-angles and violations.

10.2. Dihedral-angle violation statistics for each model

10.2.1. Bar graph: Dihedral-angle violation statistics for each model

10.3. Dihedral-angle violation statistics for the ensemble.

10.3.1. Bar graph: Dihedral angle violation statistics for the ensemble

10.4. 10.4 Most violated dihedral-angle restraints in the ensemble.

10.4.1. Histogram: Distribution of dihedral angle restraints violations

10.4.2. Table: Most violated dihedral angle restraints

10.5. All dihedral-angle violations

10.5.1. Histogram: Distribution of dihedral-angle violations

10.5.2. Table: All dihedral-angle violations

References