wwPDB 2015 News

Contents

08/24/2015 Phased PDB Release Process
08/11/2015 Announcing the 2015 EMDataBank Map Challenge
07/29/2015 Data correspondences between the PDB and CSD archives now available
07/08/2015 Hybrid/Integrative Methods Paper Published
04/11/2015 Phased PDB Release Process
02/12/2015 Changes to the Release Process for PDB Entries
01/27/2015 Time-stamped Copies of the PDB Archive

08/24/2015

Phased PDB Release Process

As announced previously, the weekly public release of data from the Protein Data Bank (PDB) archive is divided into two phases to serve better the needs of methods developers focused on protein structure prediction and protein-ligand docking. Going forward on a weekly basis, these developer communities have ~4 days during which they can make blind predictions of protein or nucleic acid structure from polymer sequence and ligand docking pose from polymer sequence and the InChI string of bound ligand. Additionally, crystallization pH value(s) are now part of this phased release.

Phase I: Every Saturday by 3:00 UTC, for every new entry, the wwPDB website provides: sequence(s) (amino acid or nucleotide) for each distinct polymer (new_release_structure_sequence.tsv) and, where appropriate, the InChI string(s) for each distinct ligand (new_release_structure_nonpolymer.tsv) and the crystallization pH value(s) (new_release_crystallization_pH.tsv).

Phase II: Every Wednesday by 00:00 UTC, all new and modified data entries will be updated at each of the wwPDB FTP sites.

08/11/2015

Announcing the 2015 EMDataBank Map Challenge

wanted image

EMDataBank/Unified Data Resource for 3DEM is pleased to announce the 2015 Map Challenge.

All members of the Scientific Community--at all levels of experience--are invited to participate as Challengers, and/or as Assessors.

Seven benchmark raw image datasets have been selected for the challenge. Six are selected from recently described single particle structure determinations with image data collected as multi-frame movies; one is based on simulated (in silico) images.

Challengers are sought to create single particle reconstructions from the targets, and then to upload their results with associated details.

Assessors are sought to participate in evaluating submitted reconstructions.

Registration is now open for all interested participants. Challengers may submit maps between August and December. Before submissions open, all are encouraged to provide feedback on submission requirements. The open assessment period will commence in early 2016.

To learn more about this challenge and to register, please visit http://challenges.emdatabank.org and click on "MAP CHALLENGE" in the menu bar.

The map challenge is the first of two community-wide challenges being sponsored by EMDataBank in 2015 to critically evaluate 3DEM methods that are coming into use, with the ultimate goal of developing validation criteria associated with every 3DEM map and map-derived model. The second challenge, focused on creating coordinate models for 3DEM maps, will be announced later this year.

07/29/2015

Data correspondences between the PDB and CSD archives now available

The Worldwide Protein Data Bank and the Cambridge Crystallographic Data Centre (CCDC; http://www.ccdc.cam.ac.uk) are pleased to announce the availability of a new data resource containing correspondences between the biopolymer components and ligand molecules found in the PDB archive that exactly match small-molecule X-ray structures in the Cambridge Structural Database (CSD) archive.

The chemical structure of every unique molecule in the Protein Data Bank is described in the PDB Chemical Component Dictionary. The new PDB Chemical Component Model data file complements information in the PDB by providing the following CSD information for matching molecular entries: accession code correspondences, Cartesian coordinates and R-value, data-collection temperature and a disorder flag, SMILES and InChI descriptors, and a Digital Object Identifier (DOI) for the citation associated with the CSD entry.

At present, there are 20,077 chemical components in the PDB Chemical Component Dictionary, and for 1,418 of these exact match structures have been identified in the CSD. The new PDB Chemical Component Model file is available from the PDB FTP archive via:

RCSB PDB ftp://ftp.wwpdb.org/pub/pdb/data/component-models/complete/chem_comp_model.cif
PDBe ftp://ftp.ebi.ac.uk/pub/databases/pdb/data/component-models/complete/chem_comp_model.cif
PDBj ftp://ftp.pdbj.org/pub/pdb/data/component-models/complete/chem_comp_model.cif

The PDB Chemical Component Model data file will be updated during the first week of each month. Later this year, the model files and the PDB Chemical Component Dictionary entries will also be made available as individual files, one for each component. As a service to the scientific community, the wwPDB partners and the CCDC make all these files available freely and without restrictions on use.

Work at the CCDC was supported by BBSRC grant BB/K016970/1 to PDBe and work at RCSB PDB was supported by NSF grant DBI-1338415.

About The Cambridge Crystallographic Data Centre

The Cambridge Crystallographic Data Centre is dedicated to the advancement of chemistry and crystallography for the public benefit. It supports structural chemistry worldwide through collaborative research studies and by developing the Cambridge Structural Database (CSD), the world's only comprehensive, up-to-date, and fully-curated knowledge base of small molecule crystal structures.

The CSD was established 50 years ago as the world's first numerical database and now comprises over 775,000 entries. The CCDC enhances its value to research scientists by providing state-of-the-art structural analysis software and expert research services for receptor modelling, ligand design, docking, lead optimization, formulation studies and materials research. The CSD and associated software services are delivered to around 1,400 research sites worldwide, including academic institutions in 80 countries and all of the world's top pharmaceutical and chemical companies.

Originating in the Department of Chemistry at the University of Cambridge, the CCDC is now a UK Research Council Independent Research Organisation and a University of Cambridge Partner Institute, constituted as a registered charity. With 50 years of scientific expertise, the CCDC has demonstrated its strong track record in basic research through more than 750 peer-reviewed publications.

07/08/2015

Hybrid/Integrative Methods Paper Published

The wwPDB established a wwPDB Hybrid Methods Task Force composed of experts in the various experimental fields that are contributing to these hybrid studies, experts in hybrid modeling, and experts in archiving. The white paper from the first meeting has been published: Outcome of the First wwPDB Hybrid/Integrative Methods Task Force Workshop Structure 23: 1156–1167 doi: 10.1016/j.str.2015.05.013

04/11/2015

Phased PDB Release Process

As announced previously, the weekly public release of data from the Protein Data Bank (PDB) archive is now divided into two phases to serve better the needs of methods developers focused on protein structure prediction and protein-ligand docking. Going forward on a weekly basis, these developer communities will have ~4 days during which they can make blind predictions of protein or nucleic acid structure from polymer sequence and ligand docking pose from polymer sequence and the InChI string of bound ligand.

Phase I: Every Saturday by 3:00 UTC, for every new entry, the wwPDB website provides: sequence(s) (amino acid or nucleotide) for each distinct polymer (new_release_structure_sequence.tsv) and, where appropriate, the InChI string(s) for each distinct ligand (new_release_structure_nonpolymer.tsv)

Phase II: Every Wednesday by 00:00 UTC, all new and modified data entries will be updated at each of the wwPDB FTP sites.

02/12/2015

Changes to the Release Process for PDB Entries

Effective April 10th 2015, weekly public release of data from the Protein Data Bank (PDB) archive will be divided into two phases to serve better the needs of methods developers focused on protein structure prediction and protein-ligand docking. Going forward on a weekly basis, these developer communities will have ~4 days during which they can make blind predictions of protein or nucleic acid structure from polymer sequence and ligand docking pose from polymer sequence and the InChI string of bound ligand.

Deposition and annotation of PDB entries are managed in a collaborative manner by the Worldwide Protein Data Bank (wwPDB; wwpdb.org), which currently consists of four regional data centers: RCSB PDB (USA), PDBe (Europe), PDBj (Japan), and BioMagResBank (USA).

Release options for PDB entries include:

  • REL: Immediate release upon completion of the deposition and annotation process
  • HPUB: Release at the time of publication of the primary citation associated with the entry
  • HOLD: Release on a prescribed date up to one year following the date of deposition

Following these release options, data entries are added to the PDB archive on a weekly schedule synchronized among FTP sites at RCSB PDB, PDBe, and PDBj.

The revised process for weekly PDB archive data release will be as follows:

Phase I: Every Saturday by 3:00 UTC, for every new entry, the following will be provided from the wwPDB website: sequence(s) (amino acid or nucleotide) for each distinct polymer and, where appropriate, the InChI string(s) for each distinct ligand.

Phase II: Every Wednesday by 00:00 UTC, all new and modified data entries will be updated at each of the wwPDB FTP sites.

This change is being made with the advice and concurrence of the Advisory Committee to the Worldwide Protein Data Bank.

01/27/2015

Time-stamped Copies of the PDB Archive

A snapshot of the PDB archive (ftp://ftp.wwpdb.org) as of January 2, 2015 has been added to ftp://snapshots.wwpdb.org/. Snapshots have been archived annually since January 2005 to provide readily identifiable data sets for research on the PDB archive.

The directory 20150102 includes the 105,465 experimentally-determined coordinate files and related experimental data that were available at that time. Coordinate data are available in PDB, mmCIF, and XML formats. The date and time stamp of each file indicates the last time the file was modified. The snapshot is 438 GB.

The script at ftp://snapshots.wwpdb.org/rsyncSnapshots.sh may be used to make a local copy of a snapshot or sections of the snapshot.