Launch with Your Own Cryo-EM Data
Setting up ROCKET with your own datasets -- CryoEM/ET half maps
In this tutorial, we walk you through how to refine a prediction to your CryoEM/ET data with ROCKET.
For cryo-EM, the best way to run ROCKET at the moment is to refine one domain or chain at a time. We use the PDB ID 8P4P, chain H
system as an example.
Note: We require users to precompute their MSA files (a3m
or sto
format required) from servers or a database beforehand. To use OpenFold, follow the sequence database download instructions (It requires ~TB space!). The --precomputed_alignment_dir
(default: alignments/
) is expected at the moment and all your alignment files in that folder will be used β we are working on integrating a sequence server into the pipeline.
π°οΈ First things first β gather your data
The ROCKET preprocessing script expects input files organized as follows:
<working_directory>/
βββ {file_id}_fasta/
β βββ {file_id}.fasta # FASTA file containing the chain to refine
β # Header should be "> {file_id}"
β
βββ {file_id}_data/
β βββ *_half_map*.mrc # For Cryo-EM data
β βββ <optional files>/ # e.g., predicted or docked models
β
βββ alignments/ # (default: --precomputed_alignment_dir)
β βββ {file_id}
| βββ*.a3m / *.hhr # MSA files for the input sequence
Run ROCKET Preprocessing
Once you have data organized in this way, you are ready to run rk.preprocess
.
A common scenario is having a model where the chains are already docked and even built to a certain degree, but a specific chain remains problematic. In this case, providing the docked chain of interest as --predocked_model
speeds up preprocessing by skipping the docking search. Providing the rest of the docked chains as --fixed_model
helps account for other parts of the map during refinement.
For example:
rk.preprocess \
--file_id 8p4pH \
--resolution 9.6 \
--method cryoem \
--output_dir 8p4pH_processed \
--precomputed_alignment_dir alignments/ \
--predocked_model 8p4pH_data/8p4pH_docked.pdb \
--fixed_model 8p4pH_data/8p4pH-fixed-model-forchainH.pdb \
--map1 8p4pH_data/emd_17425_half_map_1.map \
--map2 8p4pH_data/emd_17425_half_map_2.map \
--max_recycling_iters 20 \
--use_deepspeed_evoformer_attention
Edit YAML File
rk.preprocess
will automatically generate two yaml files under --output_dir
for the following rk.refine
run, users are expected to review and edit those yaml files directly.
If you want to generate another set of default config files:
rk.config --mode both --datamode cryoem --working-dir 8p4pH_processed --file-id 8p4pH
The --mode both
flag in the command above will set up ROCKET to run its default "aggressive" phase1 and finetuning phase2. You can optionally modify the saved config files if you want to test a specific condition for model building.
Iteratively Refine Predictions
Run the phase1 config file you generated/edited above e.g.
rk.refine 1lj5_processed/ROCKET_config_phase1.yaml
This should get you going with the refinement!
The standard protocol is to run a default phase1 and follow this up with the lower learning rate default phase2 e.g.
rk.refine 1lj5_processed/ROCKET_config_phase2.yaml
phase2 requires an existing phase1 folder to start from. If you would like to experiment with only a lower learning rate from the start, you could edit the config_phase1.yaml
accordingly.
Finalize geometry and B-factors
We recommend a brief standard run of refinement (phenix.refine
used in the paper) to refine B-factors and polish the geometry of the models that come straight out of ROCKET
.
A note from us
We hope to make ROCKET as useful and general as we can β please let us know if you run into issues setting up your own refinements and we'll see how we can help!
Create an issue in our repo
Last updated