Setting up ROCKET with your own datasets -- CryoEM/ET half maps
This tutorial shows how to refine a prediction against your own CryoEM/ET data with ROCKET.
For cryo-EM, the best way to run ROCKET at the moment is to refine one domain or chain at a time. We use the PDB ID 8P4P, chain H system as an example.
This path is best when you want to refine one chain or domain at a time against a cryo-EM or cryo-ET map.
Note: Precompute your MSA files first. ROCKET currently expects a3m or sto input from an external server or database. To use OpenFold locally, follow the sequence database download instructions. This requires about a terabyte of storage. The --precomputed_alignment_dir flag defaults to alignments/, and ROCKET will use all alignments found there.
1. Collect the required files
The ROCKET preprocessing script expects input files organized as follows:
<working_directory>/
βββ {file_id}_fasta/
β βββ {file_id}.fasta # FASTA file containing the chain to refine
β # Header should be "> {file_id}"
β
βββ {file_id}_data/
β βββ *_half_map*.mrc # For Cryo-EM data
β βββ <optional files>/ # e.g., predicted or docked models
β
βββ alignments/ # (default: --precomputed_alignment_dir)
β βββ {file_id}
| βββ*.a3m / *.hhr # MSA files for the input sequence
2. Run preprocessing
Once you have data organized in this way, you are ready to run rk.preprocess .
A common scenario is having a model where the chains are already docked and even built to a certain degree, but a specific chain remains problematic. In this case, providing the docked chain of interest as --predocked_model speeds up preprocessing by skipping the docking search. Providing the rest of the docked chains as --fixed_model helps account for other parts of the map during refinement.
For example:
If you already have a predocked model and only a single post-processed map, you can use --map alone instead:
Note: This shortcut only works when the docked placement is already known. If ROCKET still needs to search for the placement, you must provide both half maps with --map1 and --map2. In general, if you have low resolution data and you are trying to model only a small part of a larger complex, we recommend performing the docking separately with the full complex.
Use --map only for the predocked case. Use --map1 and --map2 when docking still needs to be searched.
3. Review the generated configs
rk.preprocess generates two YAML files under --output_dir for rk.refine. Review them before you start refinement.
The generated phase 1 and phase 2 configs are usually the best first run before you start tuning.
If you want to generate another set of default config files:
The --mode both flag sets up the default phase 1 and phase 2 workflow. You can edit the saved config files if you want to test a specific condition.
The standard workflow is phase 1 followed by the lower learning-rate phase 2:
Phase 2 requires an existing phase 1 folder. If you want to start with a lower learning rate, edit config_phase1.yaml directly.
5. Finalize geometry and B-factors
We recommend a short standard refinement run afterwards. We used phenix.refine in the paper. This helps polish geometry and B-factors on the ROCKET output.
A note from us
We hope to make ROCKET as useful and general as we can. If you run into setup issues, let us know and we will try to help.