Launch with Your Own Cryo-EM Data

Setting up ROCKET with your own datasets -- CryoEM/ET half maps

This tutorial shows how to refine a prediction against your own CryoEM/ET data with ROCKET.

For cryo-EM, the best way to run ROCKET at the moment is to refine one domain or chain at a time. We use the PDB ID 8P4P, chain H system as an example.

circle-info

This path is best when you want to refine one chain or domain at a time against a cryo-EM or cryo-ET map.

Note: Precompute your MSA files first. ROCKET currently expects a3m or sto input from an external server or database. To use OpenFold locally, follow the sequence database download instructionsarrow-up-right. This requires about a terabyte of storage. The --precomputed_alignment_dir flag defaults to alignments/, and ROCKET will use all alignments found there.

1. Collect the required files

The ROCKET preprocessing script expects input files organized as follows:

<working_directory>/
β”œβ”€β”€ {file_id}_fasta/
β”‚   └── {file_id}.fasta       # FASTA file containing the chain to refine
β”‚                             # Header should be "> {file_id}"
β”‚
β”œβ”€β”€ {file_id}_data/
β”‚   β”œβ”€β”€ *_half_map*.mrc       # For Cryo-EM data
β”‚   └── <optional files>/     # e.g., predicted or docked models
β”‚
β”œβ”€β”€ alignments/               # (default: --precomputed_alignment_dir)
β”‚   └── {file_id}
|       └──*.a3m / *.hhr      # MSA files for the input sequence 

2. Run preprocessing

Once you have data organized in this way, you are ready to run rk.preprocess .

A common scenario is having a model where the chains are already docked and even built to a certain degree, but a specific chain remains problematic. In this case, providing the docked chain of interest as --predocked_model speeds up preprocessing by skipping the docking search. Providing the rest of the docked chains as --fixed_model helps account for other parts of the map during refinement.

For example:

If you already have a predocked model and only a single post-processed map, you can use --map alone instead:

Note: This shortcut only works when the docked placement is already known. If ROCKET still needs to search for the placement, you must provide both half maps with --map1 and --map2. In general, if you have low resolution data and you are trying to model only a small part of a larger complex, we recommend performing the docking separately with the full complex.

circle-exclamation

3. Review the generated configs

rk.preprocess generates two YAML files under --output_dir for rk.refine. Review them before you start refinement.

circle-check

If you want to generate another set of default config files:

The --mode both flag sets up the default phase 1 and phase 2 workflow. You can edit the saved config files if you want to test a specific condition.

4. Run refinement

Run the phase 1 config first:

This will start the refinement.

If you want live run tracking, see Track Refinement with Weights & Biases.

The standard workflow is phase 1 followed by the lower learning-rate phase 2:

Phase 2 requires an existing phase 1 folder. If you want to start with a lower learning rate, edit config_phase1.yaml directly.

5. Finalize geometry and B-factors

We recommend a short standard refinement run afterwards. We used phenix.refine in the paper. This helps polish geometry and B-factors on the ROCKET output.

A note from us

We hope to make ROCKET as useful and general as we can. If you run into setup issues, let us know and we will try to help.

Create an issuearrow-up-right in our repo

Last updated