Low Resolution GroEL Sub-Tomogram Average

Model Building at Low Resolution

This tutorial walks through refinement of the E. coli groEL chain H (PDB ID 8P4P) from Figure 5 in our paper:

"Extracting Information from Low Resolution Data"

The goal is to show how ROCKET can be used to model build a chain in a cryoEM model where the map is particularly low resolution or noisy.

Collect the necessary files

We have prepared ROCKET inputs for download at https://zenodo.org/uploads/15084558.

Download and decompress the file:

tar xvf GroEL_ChainH_Tutorial.tar.gz

You will see a folder organized in the following manner:

data_for_8p4pH/
├── 8p4pH_data
│   ├── 8p4pH_docked.pdb
│   ├── 8p4pH-fixed-model-forchainH.pdb
│   ├── emd_17425_half_map_1.map
│   └── emd_17425_half_map_2.map
├── 8p4pH_fasta
│   └── 8p4pH.fasta
├── 8p4pH_preprocessing_outputs
│   ├── 8p4pH.fasta 
│   ├── alignments 
│   ├── docking_outputs
│   ├── predictions
│   ├── processed_predicted_files
│   ├── ROCKET_config_phase1.yaml
│   ├── ROCKET_config_phase2.yaml
│   ├── ROCKET_inputs
│   └── timings.json
├── alignments
│   └── 8p4pH
└── preprocessing_command.txt

For reproducibility, we have prepared all the necessary files in the 8p4pH_preprocessing_outputs. Check this cryoEM tutorial and the API for rk.preprocess if you want to do the preparation from scratch.

Refine starting prediction with ROCKET

The preprocessing command will automatically generate two config yaml files for rk.refine, you can very easily run the refinement inside the 8p4pH_preprocessing_outputswith:

rk.refine ROCKET_config_phase1.yaml

detailed settings are:

note: phase1_groEL
data:
  datamode: cryoem
  free_flag: R-free-flags
  testset_value: 1
  min_resolution: 3.0
  max_resolution: null
  voxel_spacing: 4.5
  msa_subratio: null
  w_plddt: 0.0
  downsample_ratio: null
paths:
  path: ./
  file_id: 8p4pH
  template_pdb: null
  input_msa: null
  sub_msa_path: null
  sub_delmat_path: null
  msa_feat_init_path: null
  starting_bias: null
  starting_weights: null
  uuid_hex: ec89b134ac
execution:
  cuda_device: 0
  num_of_runs: 1
  verbose: false
algorithm:
  bias_version: 3
  iterations: 400
  init_recycling: 4
  domain_segs: null
  optimization:
    additive_learning_rate: 0.0001
    multiplicative_learning_rate: 0.001
    weight_decay: 0.0001
    batch_sub_ratio: 0.7
    number_of_batches: 1
    rbr_opt_algorithm: lbfgs
    rbr_lbfgs_learning_rate: 150.0
    smooth_stage_epochs: null
    phase2_final_lr: 0.001
    l2_weight: 0.0
  features:
    solvent: true
    sfc_scale: true
    refine_sigmaA: true
    additional_chain: false
    bias_from_fullmsa: false
    chimera_profile: false

Refinement trajectory will be saved to ROCKET_outputs

Note: the preprocessed data {file_id}-Edata.mtz is oversampled for cryo-EM/ET, as this helps at the docking stage. This is why we used a lower learning rate in the refinement yaml above. We have now implemented a config.downsample_ratio parameter that when set to config.downsample_ratio=2 should automatically account for this without needing to change the learning rate.

Find the best scoring model

ROCKET will highlight the best scoring model during its refinement trajectory and save the final MSA cluster profile bias and weight tensors as best_feat_weights_H_{best_iteration}.pt and best_msa_bias_H_{best_iteration}.pt that can be used to re-predict the conformation found. The postRBR_{best_iteration}.pdb file can also be accessed directly in the output folder.

Finalize geometry and B-factors

This will have less of an effect at low resolution, but we recommend a brief standard run of refinement (phenix.refine used in the paper) to refine B-factors and polish the geometry of the best scoring model coming straight out of ROCKET .

Note

There is a degree of stochasticity in the gradient descent and per-iteration rigid body refinement protocols. For this reason, there can be slight differences between ROCKET refinements started with the same inputs and parameters.

PreviousLatent State of a Human Serpin with MSA Subsampling NextLaunch with Your Own X-ray Data

Last updated 2 months ago