Setting up ROCKET with your own datasets -- X-ray crystallography
This tutorial shows how to refine a prediction against your own X-ray dataset with ROCKET.
We use the PDB ID 1LJ5 system as an example.
This path is best when you already have experimental data, a target sequence, and precomputed alignments ready.
Note: Precompute your MSA files first. ROCKET currently expects a3m or sto input from an external server or database. To use OpenFold locally, follow the sequence database download instructions. This requires about a terabyte of storage. The --precomputed_alignment_dir flag defaults to alignments/, and ROCKET will use all alignments found there.
1. Collect the required files
The ROCKET preprocessing script expects input files organized as follows:
<working_directory>/
βββ {file_id}_fasta/
β βββ {file_id}.fasta # FASTA file containing the chain to refine
β # Header should be "> {file_id}"
β
βββ {file_id}_data/
β βββ *.mtz # For X-ray data
β βββ <optional files>/ # e.g., predicted or docked models
β
βββ alignments/ # (default: --precomputed_alignment_dir)
β βββ {file_id}
| βββ*.a3m / *.hhr # MSA files for the input sequence
2. Run preprocessing
Once your files are organized, run rk.preprocess:
Note: If the MTZ file contains more than one useful column set, Phaser will choose the best one automatically. In most cases, intensities are preferred if they are available. If you want to force a specific column set, provide an MTZ file that only contains that data. You can inspect the file with:
3. Review the generated configs
rk.preprocess generates two YAML files under --output_dir for rk.refine. Review them before you start refinement.
In most cases, the default phase 1 and phase 2 configs are a good place to start.
If you want to generate another set of default config files:
The --mode both flag sets up the default phase 1 and phase 2 workflow. You can edit the saved config files if you want to test a specific condition.
Optional: Multiple chains in the ASU
If you have multiple chains in the asymmetric unit, ROCKET does not currently refine all chains at once. The current workaround is to refine one chain while keeping the others fixed.
To use this mode, place the docked fixed chain file in ROCKET_inputs and rename it to {file_id}_added_chain.pdb:
The standard workflow is phase 1 followed by the lower learning-rate phase 2:
Phase 2 requires an existing phase 1 folder. If you want to start with a lower learning rate, edit config_phase1.yaml directly.
5. Finalize geometry and B-factors
We recommend a short standard refinement run afterwards. We used phenix.refine in the paper. This helps polish geometry and B-factors on the ROCKET output.
A note from us
We hope to make ROCKET as useful and general as we can. If you run into setup issues, let us know and we will try to help.