Launch with Your Own X-ray Data

Setting up ROCKET with your own datasets -- X-ray crystallography

This tutorial shows how to refine a prediction against your own X-ray dataset with ROCKET.

We use the PDB ID 1LJ5 system as an example.

circle-info

This path is best when you already have experimental data, a target sequence, and precomputed alignments ready.

Note: Precompute your MSA files first. ROCKET currently expects a3m or sto input from an external server or database. To use OpenFold locally, follow the sequence database download instructionsarrow-up-right. This requires about a terabyte of storage. The --precomputed_alignment_dir flag defaults to alignments/, and ROCKET will use all alignments found there.

1. Collect the required files

The ROCKET preprocessing script expects input files organized as follows:

<working_directory>/
β”œβ”€β”€ {file_id}_fasta/
β”‚   └── {file_id}.fasta       # FASTA file containing the chain to refine
β”‚                             # Header should be "> {file_id}"
β”‚
β”œβ”€β”€ {file_id}_data/
β”‚   β”œβ”€β”€ *.mtz                 # For X-ray data
β”‚   └── <optional files>/     # e.g., predicted or docked models
β”‚
β”œβ”€β”€ alignments/               # (default: --precomputed_alignment_dir)
β”‚   └── {file_id}
|       └──*.a3m / *.hhr      # MSA files for the input sequence  

2. Run preprocessing

Once your files are organized, run rk.preprocess:

Note: If the MTZ file contains more than one useful column set, Phaser will choose the best one automatically. In most cases, intensities are preferred if they are available. If you want to force a specific column set, provide an MTZ file that only contains that data. You can inspect the file with:

3. Review the generated configs

rk.preprocess generates two YAML files under --output_dir for rk.refine. Review them before you start refinement.

circle-check

If you want to generate another set of default config files:

The --mode both flag sets up the default phase 1 and phase 2 workflow. You can edit the saved config files if you want to test a specific condition.

Optional: Multiple chains in the ASU

If you have multiple chains in the asymmetric unit, ROCKET does not currently refine all chains at once. The current workaround is to refine one chain while keeping the others fixed.

To use this mode, place the docked fixed chain file in ROCKET_inputs and rename it to {file_id}_added_chain.pdb:

Then update the refinement config:

4. Run refinement

Run the phase 1 config first:

This will start the refinement.

If you want live run tracking, see Track Refinement with Weights & Biases.

The standard workflow is phase 1 followed by the lower learning-rate phase 2:

Phase 2 requires an existing phase 1 folder. If you want to start with a lower learning rate, edit config_phase1.yaml directly.

5. Finalize geometry and B-factors

We recommend a short standard refinement run afterwards. We used phenix.refine in the paper. This helps polish geometry and B-factors on the ROCKET output.

A note from us

We hope to make ROCKET as useful and general as we can. If you run into setup issues, let us know and we will try to help.

Create an issuearrow-up-right in our repo

Last updated