With this week's update, the PDB archive has passed the milestone of 150,000 entries, and now contains a total of 150,145.
Established in 1971, this central, public archive has reached this milestone thanks to the efforts of structural biologists throughout the world who collectively contribute a wealth of experimentally-determined protein and nucleic acid structure data, which is made available to researchers all around the world, across many different disciplines.
Four wwPDB data centers support online access to three-dimensional structures of biological macromolecules that help researchers understand many facets of biomedicine, agriculture, and ecology, from protein synthesis to health and disease to biological energy. The archive is large, containing more than 1.9 million files related to these PDB entries and requiring more than 512 gigabytes of storage.
The archive reached the landmark of 100,000 entries in 2014, the International Year of Crystallography. Since that record was set, the PDB continued to grow rapidly, both in number of deposited structures and in the complexity of the data. This growth has been supported by the launch of OneDep, a common global system for deposition, validation, and biocuration of PDB data for supported experimental methods. The OneDep system and the underlying PDBx/mmCIF archive format enable the PDB archive to adapt over time to meet the challenges posed by developments in structural biology. More than 41,000 structures that have been deposited, annotated, and validated using OneDep have now been released into the PDB archive, with many more entries updated to ensure consistency of the archive.
With this week's regular update, the PDB welcomes 262 new structures into the archive. These structures join others vital to research and education in fundamental biology, biomedicine, and bioenergy. Since its inception, the size of the archive has increased tenfold roughly every 10-15 years: the PDB reached 100 released entries in 1982, 1000 entries in 1993, and 10,000 in the year 2000. Now that the 150,000th is made available, more than half of the archive has been released in the past ten years.
The scientific community eagerly awaits the next 150,000 structures and the invaluable knowledge these new data will bring. However, the increasing number, size and complexity of biological data being deposited in the PDB and the emergence of hybrid structure determination methods constitute major challenges for the management and representation of structural data. wwPDB will continue to work with the community to meet these challenges and ensure that the archive maintains the highest possible standards of quality, integrity, and consistency.
Development and future of the PDB archive and wwPDB organization is described in the new reference publication for the PDB archive: Protein Data Bank: the single global archive for 3D macromolecular structure data (Nucleic Acids Res., 2019) and many other papers, including Protein Data Bank (PDB): The Single Global Macromolecular Structure Archive (Methods in Molecular Biology, 2017), How community has shaped the Protein Data Bank (Structure, 2013), and Creating a Community Resource for Protein Science (Protein Science, 2012). A full list is available.
Submission of PDBx/mmCIF format files for crystallographic depositions to the PDB will be mandatory from July 1st 2019 onward. PDB format files will no longer be accepted for deposition of structures solved by MX techniques.
PDBx/mmCIF will be the only format accepted for deposition of PDB structures resulting from macromolecular crystallography (MX), including X-ray, neutron, fiber, and electron diffraction methods via OneDep starting July 1st 2019. The deposition of PDBx/mmCIF format files will improve the efficiency of the deposition process and enhance validation through capture of the more extensive experimental metadata supported by PDBx/mmCIF, compared to the legacy PDB format. PDB entries with 100,000 or more atoms, and those with multiple character chain IDs are already not supported by the legacy PDB format. In addition, by 2021, we anticipate the PDB Chemical Component Identifier will need to be extended beyond three characters, which will necessarily result in full retirement of files in the PDB Core Archive that utilize the legacy PDB format.
Refmac, Phenix.refine, and Buster programs can now output PDBx/mmCIF formatted files. For users of other structure determination/refinement software packages, the wwPDB provides stand-alone and web-based tools to convert legacy PDB format files into PDBx/mmCIF format: pdb_extract and MAXIT. More information on outputting and preparing PDBx/mmCIF format files for deposition can be found on the wwPDB website.
The PDBx/mmCIF Working Group has committed to the PDBx/mmCIF data model. PDBx/mmCIF is also supported by visualization software applications, including Jmol/JSMol, LiteMol, Chimera, OpenRasMol, CCP4MG, COOT, PyMOL, VMD, MolMil, and NGL. In addition, other data resources, such as the Protein Model Portal and SASBDB, have adopted and extended the PDBx/mmCIF framework for data representation.
If you have any queries or comments regarding these changes, please contact the wwPDB consortium via email@example.com.
In 2018, The Office of Research Integrity (ORI) of the U.S. Department of Health and Human Services announced their final Research Misconduct Finding in the case of H.M. Krishna Murthy. It was found that Murthy reported falsified and/or fabricated research in 10 journal publications and 12 corresponding PDB structures. While the ORI was gathering and evaluating evidence in this case, 5 Murthy structures in the PDB were obsoleted in accord with wwPDB policies, in response to retraction of 4 journal publications. Following a formal request from ORI, received on April 23rd 2018, the remaining 7 Murthy structures in the PDB were obsoleted, again in accord with wwPDB policies. ORI conduct within its investigations is designed to ensure due process for individuals accused of research misconduct, and strict confidentiality is maintained throughout. Only (the final?) findings of research misconduct are made public.
In December 2009, the University of Alabama at Birmingham (UAB) announced that it planned to retract 12 PDB entries and 10 related publications authored by H.M. Krishna Murthy, in his capacity as Principal Investigator and UAB employee. Following wwPDB review of structures at the request of UAB and in accord with wwPDB policy, 5 of the structures were obsoleted upon retraction of the related publications by the journals. Following wwPDB review of structures at the request of UAB and in accord with wwPDB policy, the remaining 7 structures were obsoleted upon receipt of the ORI request.
Since that time, the 5 publications associated with the 7 structures have been retracted by the journals. A detailed PDB history of this case is available.
Structure Validation and the Role of the PDB as an Archival Data Resource
The PDB is an archival resource that stores, annotates, and disseminates structure models and their related experimental data. The wwPDB has convened expert, community-driven Validation Task Forces for X-ray (in 2008), NMR (in 2009), and (in collaboration with the EMDataBank) Cryo-EM (in 2010) to advise on the most suitable criteria to use for validating structure entries (model, experimental data, and fit of model to data) when they are deposited. Recommendations of these validation task forces have been implemented as part of the wwPDB OneDep system for deposition, annotation, and validation of PDB structures.
The results of these wwPDB validation procedures are captured in a report that is provided to depositors and can be transmitted by them to the journal to which the corresponding manuscript is submitted. Availability of such a report greatly facilitates assessment of the reliability of structural data and its interpretation by journal editors and referees alike. The wwPDB has urged journals publishing structural data on biological macromolecules to require submission of the wwPDB validation report together with the manuscript. The continuing mission of the wwPDB partners is to safeguard the integrity and improve the quality of the structural archive, with the support of the international structural biology community.
For additional information, see
A snapshot of the PDB archive (ftp://ftp.wwpdb.org) as of January 1, 2019 has been added to ftp://snapshots.wwpdb.org and ftp://snapshots.pdbj.org. Snapshots have been archived annually since 2005 to provide readily identifiable data sets for research on the PDB archive.
The directory 20190101 includes the 147,610 experimentally-determined structure and experimental data available at that time. Atomic coordinate and related metadata are available in PDBx/mmCIF, PDB, and XML file formats. The date and time stamp of each file indicates the last time the file was modified. The snapshot is 1,529 GB.