wwPDB 2024 News

Contents

02/11/2024 Biocurator Milestone: >10,000 Depositions Processed
02/05/2024 Preprint Published on NMR Restraint Validation
02/01/2024 Preprint Published on CryoEM Archiving and Validation Recommendations
01/29/2024 Preprint Published on NextGen Archive
01/28/2024 Prizes Awarded at The Biophysical Society Japan Meeting
01/08/2024 Resources for Supporting the Extended PDB ID Format (pdb_00001abc)
01/03/2024 Time-stamped Copies of PDB and EMDB Archives

02/11/2024

Biocurator Milestone: >10,000 Depositions Processed

Congratulations to biocurator Minyu Chen on processing over 10,000 PDB depositions. She is the second biocurator to reach this milestone in the PDBj and the fifth in the wwPDB. Yumiko Kengaku reached this milestone in April 2021.

Minyu received her PhD in Environmental Engineering from Osaka University and joined PDB after working at the National Cerebral and Cardiovascular Center, Osaka. She has joined PDB in 2007 and is now working at the branch office of PDBj in the Protein Research Foundation, Osaka. She has established herself as a highly qualified professional with deep understanding of scientific data and various experimental techniques and dedication to exceptional quality data curation. Her profound data curation expertise and commitment to excellence contributed to the high quality data archive for the benefit of the scientific community. We congratulate Minyu with this exciting accomplishment and look forward to her future success.

<I>Chairman of the Protein Research Foundation, Prof. Toshiharu Hase, and Dr. Minyu Chen.</I>Chairman of the Protein Research Foundation, Prof. Toshiharu Hase, and Dr. Minyu Chen.
<I>Milestone tumbler.</I>Milestone tumbler.

02/05/2024

Preprint Published on NMR Restraint Validation

<I>Graphical Abstract</I>Graphical Abstract

This manuscript addresses this challenge of validation of experimental biomolecular NMR structures against restraint data. The NMR exchange (NEF) and NMR-STAR formats provide a standardized approach for representing commonly used NMR restraints. Using these restraint formats, a standardized validation system for assessing structural models of biopolymers against restraints has been developed and implemented in the wwPDB OneDep data harvesting system. The resulting wwPDB Restraint Violation Report provides a model vs data assessment of biomolecule structures determined using distance and dihedral restraints, with extensions to other restraint types currently being implemented. These tools are useful for assessing NMR models, as well as for assessing biomolecular structure predictions based on distance restraints. We presented the rationale for model-vs-data restraint validation by the wwPDB, together with summary of validation tools and reports for NMR distance and dihedral restraints that have been developed, as implemented in the wwPDB validation pipeline and recommended by the wwPDB NMR-VTF committee.

Restraint Validation of Biomolecular Structures Determined by NMR in the Protein Data Bank
Kumaran Baskaran, Eliza Ploskon, Roberto Tejero, Masashi Yokochi, Deborah Harrus, Yuhe Liang, Ezra Peisach, Irina Persikova, Theresa A Ramelot, Monica Sekharan, James Tolchard, John D Westbrook, Benjamin Bardiaux, Charles Schwieters, Ardan Patwardhan, Sameer Velankar, Stephen K Burley, Genji Kurisu, Jeffrey C Hoch, Gaetano T Montelione, Geerten W Vuister, Jasmine Y Young
(2024) bioRxiv 2024.01.15.575520; doi: 10.1101/2024.01.15.575520

wwPDB plans to further enhance validation report by providing model-vs-data quality assessment for other kinds of restraints based on community recommendation and improve data representation on structures with multiple conformation states.

02/01/2024

Preprint Published on CryoEM Archiving and Validation Recommendations

<I>The number of released EMDB entries per year in a number of resolution bins, from
2010 until December 2023</I>The number of released EMDB entries per year in a number of resolution bins, from 2010 until December 2023

A workshop was held at EMBL-EBI (Hinxton, UK) in January 2020 to discuss data requirements for deposition and validation of cryoEM structures, with a focus on single-particle analysis and set community recommendations.

Community recommendations on cryoEM data archiving and validation
Gerard J. Kleywegt, Paul D. Adams, Sarah J. Butcher, Cathy Lawson, Alexis Rohou, Peter B. Rosenthal, Sriram Subramaniam, Maya Topf, Sanja Abbott, Philip R. Baldwin, John M. Berrisford, Gérard Bricogne, Preeti Choudhary, Tristan I. Croll, Radostin Danev, Sai J. Ganesan, Timothy Grant, Aleksandras Gutmanas, Richard Henderson, J. Bernard Heymann, Juha T. Huiskonen, Andrei Istrate, Takayuki Kato, Gabriel C. Lander, Shee-Mei Lok, Steven J. Ludtke, Garib N. Murshudov, Ryan Pye, Grigore D. Pintilie, Jane S. Richardson, Carsten Sachse, Osman Salih, Sjors H.W. Scheres, Gunnar F. Schroeder, Carlos Oscar S. Sorzano, Scott M. Stagg, Zhe Wang, Rangana Warshamanage, John D. Westbrook, Martyn D. Winn, Jasmine Y. Young, Stephen K. Burley, Jeffrey C. Hoch, Genji Kurisu, Kyle Morris, Ardan Patwardhan, Sameer Velankar
(2023) arXiv doi: 10.48550/arXiv.2311.17640

Several community recommendations from this workshop have been incorporated into wwPDB validation reports including map analysis, FSC validation, and map-model fitness using Q-score. wwPDB plans to provide overall quality percentile on map-model fitness compared to other PDB entries in the wwPDB validation report as the next step.

01/29/2024

Preprint Published on NextGen Archive

A new paper describes how the recently-announced NextGen Archive provides centralized access to integrated annotations and enriched structural information for PDB data:

NextGen Archive: Centralising Access to Integrated Annotations and Enriched Structural Information by the Worldwide Protein Data Bank
Preeti Choudhary, Zukang Feng, John Berrisford, Henry Chao, Yasuyo Ikegawa, Ezra Peisach, Dennis W. Piehl, James Smith, Ahsan Tanweer, Mihaly Varadi, John D. Westbrook, Jasmine Y. Young, Ardan Patwardhan, Kyle L. Morris, Jeffrey C. Hoch, Genji Kurisu, Sameer Velankar, Stephen K. Burley
(2023) bioRxiv doi: 10.1101/2023.10.24.563739

The PDB NextGen archive provides sequence annotation from external resources such as UniProt, SCOP2 and Pfam in addition to the content provided in the structure model files in the PDB main archive. The inclusion of UniProtKB numbering facilitates effortless structural comparisons between experimental and predicted protein models. These PDBx/mmCIF files are directly compatible with various data visualization tools, simplifying the display of annotations on 3D structure views.

01/28/2024

Prizes Awarded at The Biophysical Society Japan Meeting

The wwPDB Foundation made awards to outstanding student presentations at the 2023 The Biophysical Society Japan Meeting (November 14-16, Nagoya, Japan).

Keisuke KasaharaKeisuke Kasahara

Thermodynamic analysis of Fv-supercharged antibody–antigen interactions and control of interaction parameters
Keisuke Kasahara (1), Daisuke Kuroda (2), Jose Caaveiro (3), Satoru Nagatoishi (4), Kouhei Tsumoto (1,4)
1) Dept. Bioeng., Grad. Sch. Eng., Univ. Tokyo; 2) Res. Ctr. Drug Vaccine Dev., NIID; 3) Grad. Sch. Pharm. Sci., Kyusyu Univ., 4) Med. Dev. Dev. Reg. Res. Ctr., Grad. Sch. Eng., Univ. Tokyo

Kyle Ian Peter Le HurayKyle Ian Peter Le Huray

Harnessing the power of machine learning and high-throughput molecular dynamics simulations to predict protein-lipid interactions Kyle Ian Peter Le Huray (1,2), Frank Sobott (1), He Wang (3), Antreas Kalli (2)
1) School of Molecular and Cellular Biology, Faculty of Biological Sciences, University of Leeds, Leeds, UK; 2) Leeds Institute of Cardiovascular and Metabolic Medicine, School of Medicine, University of Leeds, Leeds, UK; 3) School of Computing, University of Leeds, Leeds, UK

Katsuhiko MinamiKatsuhiko Minami

Replication-dependent histone (Repli-Histo) labeling revealed that chromatin motion can determine DNA replication timing
Katsuhiko Minami (1,2), Satoru Ide (1,2), Sachiko Tamura (1), Masato T. Kanemaki (1,2), Kazuhiro Maeshima (1,2)
1) National Institute of Genetics; 2) Graduate Institute for Advanced Studies, SOKENDAI

Many thanks to the meeting organizers and prize judges for making these awards possible.

The wwPDB Foundation was established in 2010 to raise funds in support of the outreach activities of the wwPDB. The Foundation raised funds to help support PDB50 events, workshops, and educational publications. The Foundation is chartered as a 501(c)(3) entity exclusively for scientific, literary, charitable, and educational purposes.

Consider supporting the next 50 years of PDB's spirit of openness, cooperation, and education with a donation to the wwPDB Foundation.

01/08/2024

Resources for Supporting the Extended PDB ID Format (pdb_00001abc)

wwPDB anticipates that all the four character PDB accession codes (PDB ID) will be consumed by 2029.

With the continuous growth of PDB archive, wwPDB has revised the PDB accession code format by extending its length and prepending “PDB” (e.g., "1abc" will become "pdb_00001abc"). This process will enable text mining detection of PDB entries in the published literature and allow for more informative and transparent delivery of revised data files.

Entries with extended PDB IDs (12 characters) will not be compatible with the legacy PDB file format once four-character PDB IDs are consumed. wwPDB encourages scientific journals, PDB community and users to transition to using the PDBx/mmCIF format and the extended PDB ID format as soon as possible.

Resources are available to help PDB users with this transition through the wwPDB resource portal page (Extended PDB ID With 12 Characters). This page links to useful resources for handling this change, including an FAQ on PDB ID extension, materials to learn more about PDBx/mmCIF format, and links to other PDBx/mmCIF resources and software tools. As the transition phase progresses, more training resources will be added to this page.

Additionally, a PDB “beta” archive will be provided during the transition phase in 2026. The directory structure of this “beta” archive will mirror the data organization of the PDB Versioned Archive in the form of https://files-beta.org/pub/pdb/data/entries/two-letter-hash/pdb_accession_code/entry_data_File_names. The two-letter hash will be based on the n-2 and n-3 characters. For example, PDB entry PDB_12345678 will be under /67/. This will maintain consistency with the current PDB archive, where e.g. PDB entry 1abc is under /ab.

Once all the four character PDB accession codes are consumed, this PDB “beta” archive will become the PDB main archive and the current PDB archive will be removed.

Download example files containing extended PDB IDs for software adoption from GitHub.

wwPDB recently announced that PDB three-character Chemical Component IDs have been consumed. Five-character alphanumeric accession codes for CCD IDs are now issued by the OneDep system.

For any further information please contact us at info@wwpdb.org.

<I>Sample extended PDB ID</i>Sample extended PDB ID

01/03/2024

Time-stamped Copies of PDB and EMDB Archives

<I>New archive snapshots are available. </I>New archive snapshots are available.

A snapshot of the PDB Core archive (ftp://ftp.wwpdb.org, https://s3.rcsb.org) as of January 2, 2024 has been added to ftp://snapshots.wwpdb.org, https://s3snapshots.rcsb.org (AWS), and ftp://snapshots.pdbj.org. Snapshots have been archived annually since 2005 to provide readily identifiable data sets for research on the PDB archive.

The directory 20240101 includes the 214,121 experimentally-determined structure and experimental data available at that time. Atomic coordinate and related metadata are available in PDBx/mmCIF, PDB, and XML file formats. The date and time stamp of each file indicates the last time the file was modified. The snapshot of PDB Core Archive is 1,242 GB.

A snapshot of the EMDB Core archive (ftp://ftp.ebi.ac.uk/pub/databases/emdb/) as of January 01, 2024 can be found in ftp://ftp.ebi.ac.uk/pub/databases/emdb_vault/20240101/ and ftp://snapshots.pdbj.org/20240101/. The snapshot of EMDB Core Archive contains map files and their metadata within XML files for both released and obsoleted entries (32,033 and 282, respectively) and is 14 TB in size.