Low Resolution GroEL Sub-Tomogram Average
Model Building at Low Resolution
This tutorial walks through refinement of the E. coli groEL chain H (PDB ID 8P4P) from Figure 5 in our paper:
"Extracting Information from Low Resolution Data"
The goal is to show how ROCKET can be used to model build a chain in a cryoEM model where the map is particularly low resolution or noisy.

Collect the necessary files
We have prepared ROCKET inputs for download at https://zenodo.org/uploads/15084558.
Download and decompress the file:
tar xvf GroEL_ChainH_Tutorial.tar.gz
You will see a folder organized in the following manner:
data_for_8p4pH/
βββ 8p4pH_data
β βββ 8p4pH_docked.pdb
β βββ 8p4pH-fixed-model-forchainH.pdb
β βββ emd_17425_half_map_1.map
β βββ emd_17425_half_map_2.map
βββ 8p4pH_fasta
β βββ 8p4pH.fasta
βββ 8p4pH_preprocessing_outputs
β βββ 8p4pH.fasta
β βββ alignments
β βββ docking_outputs
β βββ predictions
β βββ processed_predicted_files
β βββ ROCKET_config_phase1.yaml
β βββ ROCKET_config_phase2.yaml
β βββ ROCKET_inputs
β βββ timings.json
βββ alignments
β βββ 8p4pH
βββ preprocessing_command.txt
For reproducibility, we have prepared all the necessary files in the 8p4pH_preprocessing_outputs
. Check this cryoEM tutorial and the API for rk.preprocess
if you want to do the preparation from scratch.
Refine starting prediction with ROCKET
The preprocessing command will automatically generate two config yaml files for rk.refine
, you can very easily run the refinement inside the 8p4pH_preprocessing_outputs
with:
rk.refine ROCKET_config_phase1.yaml
detailed settings are:
note: phase1_groEL
data:
datamode: cryoem
free_flag: R-free-flags
testset_value: 1
min_resolution: 3.0
max_resolution: null
voxel_spacing: 4.5
msa_subratio: null
w_plddt: 0.0
downsample_ratio: null
paths:
path: ./
file_id: 8p4pH
template_pdb: null
input_msa: null
sub_msa_path: null
sub_delmat_path: null
msa_feat_init_path: null
starting_bias: null
starting_weights: null
uuid_hex: ec89b134ac
execution:
cuda_device: 0
num_of_runs: 1
verbose: false
algorithm:
bias_version: 3
iterations: 400
init_recycling: 4
domain_segs: null
optimization:
additive_learning_rate: 0.0001
multiplicative_learning_rate: 0.001
weight_decay: 0.0001
batch_sub_ratio: 0.7
number_of_batches: 1
rbr_opt_algorithm: lbfgs
rbr_lbfgs_learning_rate: 150.0
smooth_stage_epochs: null
phase2_final_lr: 0.001
l2_weight: 0.0
features:
solvent: true
sfc_scale: true
refine_sigmaA: true
additional_chain: false
bias_from_fullmsa: false
chimera_profile: false
Refinement trajectory will be saved to ROCKET_outputs
Note: the preprocessed data {file_id}-Edata.mtz
is oversampled for cryo-EM/ET, as this helps at the docking stage. This is why we used a lower learning rate in the refinement yaml above. We have now implemented a config.downsample_ratio
parameter that when set to config.downsample_ratio=2
should automatically account for this without needing to change the learning rate.
Find the best scoring model
ROCKET will highlight the best scoring model during its refinement trajectory and save the final MSA cluster profile bias and weight tensors as best_feat_weights_H_{best_iteration}.pt
and best_msa_bias_H_{best_iteration}.pt
that can be used to re-predict the conformation found. The postRBR_{best_iteration}.pdb
file can also be accessed directly in the output folder.

Finalize geometry and B-factors
This will have less of an effect at low resolution, but we recommend a brief standard run of refinement (phenix.refine
used in the paper) to refine B-factors and polish the geometry of the best scoring model coming straight out of ROCKET
.
Note
There is a degree of stochasticity in the gradient descent and per-iteration rigid body refinement protocols. For this reason, there can be slight differences between ROCKET refinements started with the same inputs and parameters.
Last updated