wwPDB Welcome to the Worldwide Protein Data Bank
Access the PDB FTP:
RCSB PDB | PDBe | PDBj
Archive Download
Chemical Component Dictionary
Biologically Interesting Molecule Reference Dictionary (BIRD)
New Deposition and Annotation System
Tutorial
System Information
FAQ
Validation Reports
Reports
Server Information
Deposit Data to the PDB:
RCSB PDB | PDBe
PDBj | BMRB
Search for Structures:
RCSB PDB | PDBe
PDBj | BMRB
PDB Archive Snapshots:
RCSB PDB | PDBj
Instructions to Journals
Documentation: Format, Annotation and Policies, Remediation
Workshops and Task Forces
X-ray Validation
NMR Validation
EM Validation
SAS Task Force
PDBx/mmCIF Working Group
Past Symposia
Int'l Year of Crystallography
wwPDBAC
EMDB

Chemical Component Dictionary

The Chemical Component Dictionarya is as an external reference file describing all residue and small molecule components found in PDB entries. This dictionary contains detailed chemical descriptions for standard and modified amino acids/nucleotides, small molecule ligands, and solvent molecules. Each chemical definition includes descriptions of chemical properties such as stereochemical assignments, chemical descriptors (SMILES & InChI), systematic chemical names, and idealized coordinates (generated using Molecular Networks' Corina, and if there are issues, OpenEye's OMEGA).

The dictionary is organized by the 3-character alphanumeric code that PDB assigns to each chemical component. New chemical component definitions appear in the dictionary as the entries in which they are observed are released in the PDB archive; consequently, the dictionary is updated with each weekly PDB release. The dictionary is regularly reviewed and remediated. Any obsoleted components remain in the dictionary marked with status OBS.

Users can search and browse the Chemical Component Dictionary using resources such as PDBeChem and Ligand Expo.

The entire Chemical Component Dictionary and the companion dictionary of amino acid protonation variants can be downloaded from the wwPDB ftp site:

Chemical Component Dictionary: mmCIF (plain text) | mmCIF (gz)
Protonation Variants Companion Dictionary: mmCIF (plain text) | mmCIF (gz)

Please note that these files are large, and may take awhile to download.

The dictionary of protonation variants provides additional nomenclature information for the protonation states of standard amino acids in N-terminal, C-terminal, and free forms, and includes common side chain protonation states. The identifiers used in this extension dictionary longer identifier codes to distinguish the various protonation forms of the standard amino acids. For instance, an identifier code ARG_LFOH_DHH12 is used to identify the arginine variant with a neutral peptide unit and side chain protonated at NH1. The extended identifier codes are not compatible with the 3-character format restrictions for the residue identifier in the PDB format, so these codes do not currently appear in PDB files. In PDB entries, protonated residues are identified by the 3-character code of their parent amino acid; however, the atom nomenclature for protonated forms will be taken from the variant dictionary definitions.

Prior to development of the Chemical Component Dictionary, PDB chemical information was solely in the form of connection tables. This older representation, called the PDB HET dictionary, is still made available on the wwPDB ftp site (download). PDB HET format dictionary entries for individual components are available at ftp://ftp.wwpdb.org/pub/pdb/data/monomers/.

aThe Chemical Component Dictionary was formerly called the HET Group Dictionary.

Descriptions of chemical components in mmCIF and PDB formats are provided below.


PDBeChem

PDBeChem1 offers a wide range of possibilities for searching and exploring the dictionary:

  • Search for a particular 3-letter code
  • Search using part of the name
  • Search for a formula range
  • Search for a substructure
  • Search for a fragment expression

Users can also search by references in macromolecules, molecule classification, and atom energy type.

A generic browsing interface lets users follow links that are available from every record in order to navigate through the relationships of the dictionary. For example, a relationship link can be followed to view the atoms of a ligand and then for a particular atom, its bonds and energy types and so on.

For more information, please see
http://www.ebi.ac.uk/msd-srv/msdchem/ligand/help.htm


Ligand Expo

Ligand Expo, formerly the Ligand Depot2, can be used to navigate the Chemical Component Dictionary. It integrates databases, services, tools and methods related to small molecules, and allows users to:

  • Search for a chemical component
  • Browse tables of components that contain
    • modified amino acids and nucleotides
    • popular drugs (trade and generic names)
    • common ring systems
  • Review related information in chemical dictionaries and resource files (chemistry, geometry, atom nomenclature, and more)
  • Download model and ideal chemical component coordinates
  • View all instances of a component in released PDB entries

Ligand Expo provides information in Chemical Component Dictionary and individual chemical components within PDB entries for download in a variety of formats and packaging at http://ligand-expo.rcsb.org/ld-download.html.


Chemical Components in mmCIF Format

The mmCIF format combines collections of related data items (tokens) into categories. A category is essentially a table in which each token represents a row in the table. The question mark (?) is used to mark an item value as missing. A period (.) may be used to identify that there is no appropriate value for the item or that a value has been intentionally omitted.

Vectors and tables of data may be encoded in mmCIF using a loop_ directive. To build a table, the data item names corresponding to the table columns are preceded by the loop_ directive, and followed by the corresponding rows of data.

A detailed description of the mmCIF syntax and logic structure is available.

In the Chemical Component Dictionary, each chemical component is defined by sets of tokens in the five categories:

Category (click on link for dictionary definition) Summary of Category Contents (with examples)
chem_comp Table 1
chem_comp_atom Table 2
chem_comp_bond Table 3
pdbx_chem_comp_descriptor Table 4
pdbx_chem_comp_identifier Table 5

In a PDB entry, the mmCIF category chem_comp is used to describe the chemical components in the file. The chemical name is described in chem_comp.name, chemical formula in chem_comp.formula, and molecular weight in chem_comp.formula_weight.

For example, the mmCIF file for PDB entry 1t5d contains the ligand 4-Chloro-benzoic Acid (ID code: 174):

      data_174
  # 
  _chem_comp.id                                    174 
  _chem_comp.name                                  "4-CHLORO-BENZOIC ACID" 
  _chem_comp.type                                  NON-POLYMER 
  _chem_comp.pdbx_type                             HETAIN 
  _chem_comp.formula                               "C7 H5 Cl  O2" 
  _chem_comp.mon_nstd_parent_comp_id               ? 
  _chem_comp.pdbx_synonyms                         ? 
  _chem_comp.pdbx_formal_charge                    0 
  _chem_comp.pdbx_initial_date                     2004-05-07 
  _chem_comp.pdbx_modified_date                    2008-04-29 
  _chem_comp.pdbx_ambiguous_flag                   N 
  _chem_comp.pdbx_release_status                   REL 
  _chem_comp.pdbx_replaced_by                      ? 
  _chem_comp.pdbx_replaces                         ? 
  _chem_comp.formula_weight                        156.566 
  _chem_comp.one_letter_code                       ? 
  _chem_comp.three_letter_code                     174 
  _chem_comp.pdbx_model_coordinates_details        ? 
  _chem_comp.pdbx_model_coordinates_missing_flag   N 
  _chem_comp.pdbx_ideal_coordinates_details        ? 
  _chem_comp.pdbx_ideal_coordinates_missing_flag   N 
  _chem_comp.pdbx_model_coordinates_db_code        1T5D 
  _chem_comp.pdbx_processing_site                  RCSB 

Further information describing this residue (174) is then provided in the Chemical Component Dictionary (See the Example).


Chemical Components in PDB Format

The heterogen section of a PDB coordinate file describes ligands in the entry. The chemical name of the ligand is given in the HETNAM record and the chemical formula is given in the FORMUL record. Any synonyms for the chemical name are given in the HETSYN records.

For example, the PDB format file for PDB entry 1t5d contains the ligand 4-Chloro-benzoic Acid (ID code: 174):

  HET    174             15 
  HETNAM     174 4-CHLORO-BENZOIC  ACID 
  FORMUL      174    C7 H5 CL O2  

Further information describing this residue (174) is then provided in the Chemical Component Dictionary (See the Example).

Please refer to the PDB File Format Guide for further description.

Example: Information available about chemical components found in the PDB archive using 4-Chloro-benzoic acid as an example.
ccd image
Note: Diagrams are not included in the Chemical Component Dictionary.

Chemical Component Dictionary (mmCIF Format)


data_174
# 
_chem_comp.id                                    174 
_chem_comp.name                                  "4-CHLORO-BENZOIC ACID" 
_chem_comp.type                                  NON-POLYMER 
_chem_comp.pdbx_type                             HETAIN 
_chem_comp.formula                               "C7 H5 Cl O2" 
_chem_comp.mon_nstd_parent_comp_id               ? 
_chem_comp.pdbx_synonyms                         ? 
_chem_comp.pdbx_formal_charge                    0 
_chem_comp.pdbx_initial_date                     2004-05-07 
_chem_comp.pdbx_modified_date                    2008-04-29 
_chem_comp.pdbx_ambiguous_flag                   N 
_chem_comp.pdbx_release_status                   REL 
_chem_comp.pdbx_replaced_by                      ? 
_chem_comp.pdbx_replaces                         ? 
_chem_comp.formula_weight                        156.566 
_chem_comp.one_letter_code                       ? 
_chem_comp.three_letter_code                     174 
_chem_comp.pdbx_model_coordinates_details        ? 
_chem_comp.pdbx_model_coordinates_missing_flag   N 
_chem_comp.pdbx_ideal_coordinates_details        ? 
_chem_comp.pdbx_ideal_coordinates_missing_flag   N 
_chem_comp.pdbx_model_coordinates_db_code        1T5D 
_chem_comp.pdbx_processing_site                  RCSB 
# 
loop_
_chem_comp_atom.comp_id 
_chem_comp_atom.atom_id 
_chem_comp_atom.alt_atom_id 
_chem_comp_atom.type_symbol 
_chem_comp_atom.charge 
_chem_comp_atom.pdbx_align 
_chem_comp_atom.pdbx_aromatic_flag 
_chem_comp_atom.pdbx_leaving_atom_flag 
_chem_comp_atom.pdbx_stereo_config 
_chem_comp_atom.model_Cartn_x 
_chem_comp_atom.model_Cartn_y 
_chem_comp_atom.model_Cartn_z 
_chem_comp_atom.pdbx_model_Cartn_x_ideal 
_chem_comp_atom.pdbx_model_Cartn_y_ideal 
_chem_comp_atom.pdbx_model_Cartn_z_ideal 
_chem_comp_atom.pdbx_ordinal 
174 CL4 CL4 CL 0 0 N N N -19.787 95.862 18.541 0.032  -0.000 -3.376 1
174 C4  C4  C  0 1 Y N N -19.932 94.201 19.219 0.005  -0.000 -1.640 2
174 C5  C5  C  0 1 Y N N -18.817 93.715 19.901 -1.205 0.000  -0.969 3
174 C6  C6  C  0 1 Y N N -18.847 92.452 20.466 -1.233 0.000  0.409  4
174 C3  C3  C  0 1 Y N N -21.099 93.428 19.089 1.196  -0.000 -0.932 5
174 C2  C2  C  0 1 Y N N -21.127 92.158 19.664 1.182  0.004  0.446  6
174 C1  C1  C  0 1 Y N N -19.996 91.681 20.342 -0.036 -0.000 1.128  7
174 C   C   C  0 1 N N N -19.962 90.330 20.989 -0.059 -0.000 2.605  8
174 O1  O1  O  0 1 N N N -20.968 89.592 20.924 1.097  -0.001 3.296  9
174 O2  O2  O  0 1 N N N -18.919 89.991 21.597 -1.120 0.000  3.196  10
174 H5  H5  H  0 1 N N N -17.907 94.332 19.994 -2.130 0.001  -1.526 11 
174 H6  H6  H  0 1 N N N -17.967 92.065 21.008 -2.178 0.000  0.931  12 
174 H3  H3  H  0 1 N N N -21.978 93.812 18.545 2.138  -0.001 -1.461 13 
174 H2  H2  H  0 1 N N N -22.035 91.537 19.583 2.110  0.003  0.997  14 
174 HO1 HO1 H  0 1 N N N -20.946 88.735 21.334 1.082  -0.001 4.263  15 
# 
loop_
_chem_comp_bond.comp_id 
_chem_comp_bond.atom_id_1 
_chem_comp_bond.atom_id_2 
_chem_comp_bond.value_order 
_chem_comp_bond.pdbx_aromatic_flag 
_chem_comp_bond.pdbx_stereo_config 
_chem_comp_bond.pdbx_ordinal 
174 CL4 C4  SING N N 1
174 C4  C5  DOUB Y N 2
174 C4  C3  SING Y N 3
174 C5  C6  SING Y N 4
174 C5  H5  SING N N 5
174 C6  C1  DOUB Y N 6
174 C6  H6  SING N N 7
174 C3  C2  DOUB Y N 8
174 C3  H3  SING N N 9
174 C2  C1  SING Y N 10
174 C2  H2  SING N N 11
174 C1  C   SING N N 12
174 C   O1  SING N N 13
174 C   O2  DOUB N N 14
174 O1  HO1 SING N N 15
# 
loop_
_pdbx_chem_comp_descriptor.comp_id 
_pdbx_chem_comp_descriptor.type 
_pdbx_chem_comp_descriptor.program 
_pdbx_chem_comp_descriptor.program_version 
_pdbx_chem_comp_descriptor.descriptor 
174 SMILES            ACDLabs               10.04  O=C(O)c1ccc(Cl)cc1
174 SMILES_CANONICAL  CACTVS                3.341  OC(=O)c1ccc(Cl)cc1
174 SMILES            CACTVS                3.341  OC(=O)c1ccc(Cl)cc1
174 SMILES_CANONICAL  "OpenEye OEToolkits"  1.5.0  c1cc(ccc1C(=O)O)Cl
174 SMILES            "OpenEye OEToolkits"  1.5.0  c1cc(ccc1C(=O)O)Cl
174 InChI             InChI                 1.02b  InChI=1/C7H5ClO2/c8-6-
3-1-5(2-4-6)7(9)10/h1-4H,(H,9,10)/f/h9H
174 InChIKey          InChI                 1.02b  XRHGYUZYPHTUJZ-BGGKNDAXCA
# 
loop_
_pdbx_chem_comp_identifier.comp_id 
_pdbx_chem_comp_identifier.type 
_pdbx_chem_comp_identifier.program 
_pdbx_chem_comp_identifier.program_version 
_pdbx_chem_comp_identifier.identifier 
174 "SYSTEMATIC NAME" ACDLabs              10.04 "4-chlorobenzoic acid"
174 "SYSTEMATIC NAME" "OpenEye OEToolkits" 1.5.0 "4-chlorobenzoic acid"
#

Heterogen List (PDB Format)

RESIDUE   174     15
CONECT      CL4    1 C4
CONECT      C4     3 CL4     C5   C3
CONECT      C5     3 C4      C6   H5
CONECT      C6     3 C5      C1   H6
CONECT      C3     3 C4      C2   H3
CONECT      C2     3 C3      C1   H2
CONECT      C1     3 C6      C2   C
CONECT      C      3 C1      O1   O2
CONECT      O1     2 C       HO1 
CONECT      O2     1 C
CONECT      H5     1 C5
CONECT      H6     1 C6
CONECT      H3     1 C3
CONECT      H2     1 C2
CONECT      HO1    1 O1
END
HET    174             15
HETNAM     174 4-CHLORO-BENZOIC    ACID
FORMUL      174    7 H5      Cl1  O2 

1 D. Dimitropoulos, J. Ionides, K. Henrick (2006) UNIT 14.3: Using MSDchem to search the PDB ligand dictionary In Current Protocols in Bioinformatics (A.D. Baxevanis, R.D.M. Page, G.A. Petsko, L.D. Stein, and G.D. Stormo, eds.) pp 14.3.1-14.3.3 John Wiley & Sons, Hoboken, NJ.
2 Z. Feng, L. Chen, H. Maddula, O. Akcan, R. Oughtred, H.M. Berman, J. Westbrook. (2004) Ligand Depot: a data warehouse for ligands bound to macromolecules. Bioinformatics 20(13):2153-2155.

© wwPDB