Launch with Your Own X-ray Data

Setting up ROCKET with your own datasets -- X-ray crystallography

In this tutorial, we walk you through how to refine a prediction to your own X-ray dataset with ROCKET.

We use the PDB ID 1LJ5 system as an example.

Note: We require users to precompute their MSA files (a3mor sto format required) from servers or a database beforehand. To use OpenFold, follow the sequence database download instructions (It requires ~TB space!). The --precomputed_alignment_dir (default: alignments/) is expected at the moment and all your alignment files in that folder will be used – we are working on integrating a sequence server into the pipeline.

πŸ›°οΈ First things first – gather your data

The ROCKET preprocessing script expects input files organized as follows:

<working_directory>/
β”œβ”€β”€ {file_id}_fasta/
β”‚   └── {file_id}.fasta       # FASTA file containing the chain to refine
β”‚                             # Header should be "> {file_id}"
β”‚
β”œβ”€β”€ {file_id}_data/
β”‚   β”œβ”€β”€ *.mtz                 # For X-ray data
β”‚   └── <optional files>/     # e.g., predicted or docked models
β”‚
β”œβ”€β”€ alignments/               # (default: --precomputed_alignment_dir)
β”‚   └── {file_id}
|       └──*.a3m / *.hhr      # MSA files for the input sequence  

Run ROCKET Preprocessing

Once you have data organized in this way, you are ready to run rk.preprocess . For example:

Note: If the MTZ file provided contains more than one set of relevant columns (e.g. intensities and errors + structure factor amplitudes and errors), Phaser will pick the best set for you – it's likely better to work from the intensities if available! If you absolutely want to use a specific set of columns, you can provide an MTZ that contains only that data. Run a quick investigation of your MTZ file by:

Edit YAML File

rk.preprocess will automatically generate two yaml files under --output_dir for the following rk.refinerun, users are expected to review and edit those yaml files directly.

If you want to generate another set of default config files:

The --mode both flag in the command above will set up ROCKET to run its default "aggressive" phase1 and finetuning phase2. You can optionally modify the saved config files if you want to test a specific condition for model building.

(Optional) Multiple Chains in the ASU

If you have multiple chains in the asymmetric unit (ASU), ROCKET does not currently support refining all chains simultaneously. As a workaround, ROCKET allows you to refine one chain while keeping the others fixed.

To use this functionality, place the PDB file with docked fixed chain(s) in ROCKET_inputs folder, and rename it as {file_id}_added_chain.pdb. The input folder then would look like:

And then, in your refinement config yaml file, change the following two lines:

Iteratively Refine Predictions

Run the phase1 config file you generated/edited above e.g.

This should get you going with the refinement!

The standard protocol is to run a default phase1 and follow this up with the lower learning rate default phase2 e.g.

phase2 requires an existing phase1 folder to start from. If you would like to experiment with only a lower learning rate from the start, you could edit the config_phase1.yaml accordingly.

Finalize geometry and B-factors

We recommend a brief standard run of refinement (phenix.refine used in the paper) to refine B-factors and polish the geometry of the models that come straight out of ROCKET .

A note from us

We hope to make ROCKET as useful and general as we can – please let us know if you run into issues setting up your own refinements and we'll see how we can help!

Create an issue in our repo

Last updated